Closed ctrueden closed 9 years ago
Pushed from 0.1.0
to 0.2.0
, because this isn't really a blocker for the initial release announcement. The announcement just needs to warn that the development is still very active and subject to restructuring within the next weeks.
Hi Curtis
My understanding, after talking to you about it at the hackathon, is that there are 2 parts to this.
I looked at JTransforms and the only thing I couldn't figure out is how to get the efficient sizes. FFTs work fastest at certain sizes (power of 2 is best, but other sizes work also) and you need to know these sizes to set up the padding correctly. I could not figure out how to do this with the JTransforms API.
@StephanPreibisch do you have any plans for this??
I looked at JTransforms and the only thing I couldn't figure out is how to get the efficient sizes.
AFAICT JTransforms only handles powers of two really efficiently, everything else is handled by the exact same code:
There is another issue we might have to hash out with @wendykierp at some stage: due to the static way nthreads
is determined the code is not cluster-safe (where you would need to specify on a per-Context level how many threads are permissible, as opposed to setting it globally).
There are three FFT algorithms implemented in JTransforms: split-radix (only for powers of two), mixed-radix (2-, 3-, 4-, 5-, n-) and Bluestein (used when the reminder from the integer division of input size by 2, 3, 4 and 5 is greater or equal 211):
Thanks @dscho and @wendykierp. So then can we use this approach http://ltfat.sourceforge.net/notes/ltfatnote017.pdf to find the efficient sizes in between powers of 2??
@bnorthan interesting, but I think a table lookup with 2^20 entries is overkill.
Given a number N
and a set of prime numbers p_i
you should be able to set up a linear programming algorithm that minimizes \sum_i (a_i \cdot\log p_i) - \log N
under the constraint that it has to be non-negative.
I was thinking of a smaller lookup table... but that equation looks like it would work as well.
@bnorthan see also 5.7 of http://www.math.leidenuniv.nl/~psh/ANTproc/09andrew.pdf
@bnorthan I ran out of time, but I did implement something you might find useful:
package org.scijava.math;
import java.util.Arrays;
/**
* A class to determine the next <i>k</i>-smooth number (i.e. a number divisible
* only by prime numbers up to <i>k</i>), given a lower bound <i>n</i>. Based on
* A. Granville, <i>Smooth numbers: computational number theory and beyond</i>:
* <blockquote> 5.7. <b>Finding smooth numbers computationally.</b> The obvious
* way to find y-smooth numbers in (x; x + z) with z ≤ x is to initialize an
* array a[i] := 0 for 1 ≤ i ≤ z (where a[i] corresponds to x + i). For
* each successively larger prime power p<sup>j</sup> ≤ x + z with p ≤ y,
* determine the smallest i such that p<sup>j</sup> divides x + i and then add
* log(p) to a[i], a[i + p<sup>j</sup>], a[i + 2p<sup>j</sup>] and so on, up
* until the end of the array. When we’ve finished, if any a[i] ≥ log(x),
* then x + i is y-smooth. </blockquote>
*
* @author Johannes Schindelin
*/
public class NextSmoothNumber {
public static int nextSmooth(final int y, final int x, final int z) {
final double[] a = new double[z];
handlePrime(2, x, a);
handlePrime(3, x, a);
handlePrime(5, x, a);
handlePrime(7, x, a);
double log = Math.log(x);
for (int i = 0; i < a.length; i++) {
if (a[i] >= log) return x + i;
}
System.err.println(Arrays.toString(a));
System.err.println(a[32805 - x]);
return -1;
}
private static void handlePrime(final int p, final int x, final double[] a) {
double log = Math.log(p);
for (int power = p; power <= x + a.length; power *= p) {
int j = x % power;
if (j > 0) j = power - j;
while (j < a.length) {
a[j] += log;
j += power;
}
}
}
private static int log2(int x) {
for (int j = 0, k = 1; ; j++, k *= 2) {
if (k >= x) return j;
}
}
public static void main(String... args) {
int x = args.length == 0 ? 32769 : Integer.parseInt(args[0]);
System.err.println(log2(x));
System.out.println(nextSmooth(7, x, 4 * log2(x)));
}
}
What I wanted to do is to run a loop from 32000 to 33000 and verify that the produced numbers are 7-smooth...
Nice. I'll take a look at it and see if I can finish off the test.
Thanks @bnorthan! I reassigned this issue to you, and moved it to the "medium" milestone.
@dscho it worked great. The only problem I found is that the heuristic for the search size (4*log2(x)) was too small sometimes. For example if x=32000 it doesn't find any smooth numbers within the search range. So I wrote a little wrapper function that just searches ahead 50 at a time. I think that makes sense and works but let me know if you have any other ideas.
package org.scijava.math;
import java.util.Arrays;
/**
* A class to determine the next <i>k</i>-smooth number (i.e. a number divisible
* only by prime numbers up to <i>k</i>), given a lower bound <i>n</i>. Based on
* A. Granville, <i>Smooth numbers: computational number theory and beyond</i>:
* <blockquote> 5.7. <b>Finding smooth numbers computationally.</b> The obvious
* way to find y-smooth numbers in (x; x + z) with z ≤ x is to initialize an
* array a[i] := 0 for 1 ≤ i ≤ z (where a[i] corresponds to x + i). For
* each successively larger prime power p<sup>j</sup> ≤ x + z with p ≤ y,
* determine the smallest i such that p<sup>j</sup> divides x + i and then add
* log(p) to a[i], a[i + p<sup>j</sup>], a[i + 2p<sup>j</sup>] and so on, up
* until the end of the array. When we’ve finished, if any a[i] ≥ log(x),
* then x + i is y-smooth. </blockquote>
*
* @author Johannes Schindelin
*/
public class NextSmoothNumber {
public static int nextSmooth(int x) {
int result=-1;
int z=50;
while (result==-1) {
result=nextSmooth(7, x, z);
x=x+z;
}
return result;
}
public static int nextSmooth(final int y, final int x, final int z) {
final double[] a = new double[z];
handlePrime(2, x, a);
handlePrime(3, x, a);
handlePrime(5, x, a);
handlePrime(7, x, a);
double log = Math.log(x);
for (int i = 0; i < a.length; i++) {
if (a[i] >= log) return x + i;
}
//System.err.println(Arrays.toString(a));
//System.err.println(a[32805 - x]);
return -1;
}
private static void handlePrime(final int p, final int x, final double[] a) {
double log = Math.log(p);
for (int power = p; power <= x + a.length; power *= p) {
int j = x % power;
if (j > 0) j = power - j;
while (j < a.length) {
a[j] += log;
j += power;
}
}
}
private static int log2(int x) {
for (int j = 0, k = 1; ; j++, k *= 2) {
if (k >= x) return j;
}
}
public static void main(String... args) {
int x = args.length == 0 ? 32769 : Integer.parseInt(args[0]);
System.err.println(log2(x));
int result=nextSmooth(7, x, 4 * log2(x));
System.out.println(result);
}
}
And the test is below (the prime factor routine is from http://www.vogella.com/tutorials/JavaAlgorithmsPrimeFactorization/article.html)
package org.scijava.math;
import java.util.Arrays;
import java.util.List;
import java.util.ArrayList;
public class TestSmoothNumber {
public static void main(String... args) {
int start = args.length <= 1 ? 32000 : Integer.parseInt(args[0]);
int finish = args.length <= 1 ? 33000 : Integer.parseInt(args[1]);
int numFailures=0;
for (int i=start;i<=finish;i++) {
int result=NextSmoothNumber.nextSmooth(i);
System.out.println(i+": "+result);
if (result==-1) {
numFailures++;
}
else {
List<Integer> factors=primeFactors(result);
if (factors.get(factors.size()-1)>7) {
numFailures++;
}
}
}
System.out.println("Number Failures: "+numFailures);
}
private static List<Integer> primeFactors(int number) {
int n = number;
List<Integer> factors = new ArrayList<Integer>();
for (int i = 2; i <= n; i++) {
while (n % i == 0) {
factors.add(i);
n /= i;
}
}
return factors;
}
}
Great job! As for the test, I could imagine that the following is faster and simpler than full factorization:
// expects a positive integet
public static boolean is7Smooth(int x) {
for (int i = 7; i > 1; i--) {
while ((x % i) == 0) {
x /= i;
}
}
return x == 1;
}
I believe that the improvement in complexity (linear runtime in the bitwidth of the input instead of exponential, constant memory instead of exponential) makes up for the inefficiency of dividing by non-primes 4 and 6...
So I wrote a little wrapper function that just searches ahead 50 at a time.
Great! I'd use a for (;;) { int result = ...; if (result > 0) return result }
loop to avoid an unnecessary assignment, but this method is probably not mission critical enough to optimize for speed.
Any progress on this one?
I forked imglib and am testing things out on a branch (work in progress - still needs testing).
https://github.com/bnorthan/imglib/tree/fft_JTransform
The key file is FFTMethods.java. I have a FFTMethodsJTransform.java which is basically the same except uses JTransforms for the low level 1D transforms and NextSmoothNumber.java to calculate the fast size.
Do you want to keep a mines version?? If so FFTMethods.java might have to be refactored a bit to handle both APIs, otherwise FFTMethodsJTransform.java could just replace FFTMethods.
That's great! IMHO we do not need to keep the mines version; @StephanPreibisch reported that he chose it because at the time it was the fastest FFT in Java. This information is a couple of years old, though, I do not have newer information.
Still WIP. But I've added some tests.
algorithms/core/src/test/java/net/imglib2/algorithm/fft2/testFFT.java.
I'll have to do some reading and make sure the accuracy makes sense.
@dscho Things are coming along. I am updating the fftmethods_wrapper branch every couple of days.
There is another issue we might have to hash out with @wendykierp at some stage: due to the static way nthreads is determined the code is not cluster-safe (where you would need to specify on a per-Context level how many threads are permissible, as opposed to setting it globally).
A related issue is that when performing a multi dimensional fft, the FFTMethods class calls n 1D ffts in parallel (ie it processes several lines in parallel) where n is the number of processors. The assumption seems to be that the 1D fft is not threaded. It would be nice to be able to control the number of threads used for the 1D fft. Then we could optimize speed by tuning the number of lines processed in parallel and the number of threads per line.
That being said initial tests indicate that fftmethods wrapping mines and fftmethods wrapping jtransform run at similar speed for ND-fft (even though I haven't tried to optimize the number of threads).
Hi @bnorthan, I am really looking forward to that, it is great that you put so much effort into this! This will make our lives much easier.
ConcurrenctUtils.setThreadsBeginN_1D_FFT_2Threads(n) and ConcurrenctUtils.setThreadsBeginN_1D_FFT_4Threads(n) can be used to disable threads in 1D fft. Just set them to Long.MAX_VALUE. ConcurrencyUtils.setNumberOfThreads(n) can be called at any time. Here is an example:
int size = 1<<14;
FloatFFT_2D fft = new FloatFFT_2D(size,size);
float[] x = new float[2*size*size];
IOUtils.fillMatrix_2D(size, 2*size, x);
ConcurrencyUtils.setNumberOfThreads(1);
long t = System.nanoTime();
fft.complexForward(x);
System.out.println("Single thread time = " + (System.nanoTime() - t) / 1e9 + " sec.");
ConcurrencyUtils.setNumberOfThreads(16);
IOUtils.fillMatrix_2D(size, 2*size, x);
t = System.nanoTime();
fft.complexForward(x);
System.out.println("Sixteen threads time = " + (System.nanoTime() - t) / 1e9 + " sec.");
And the output:
Single thread time = 18.785440844 sec.
Sixteen threads time = 2.256298538 sec.
@wendykierp The problem is that this is global, yet unaware how much else is going on. For the record: SciJava commons sports a Context
and a ThreadService
for that purpose (the ThreadService
is an interface and can be overridden by an application-specific implementation). I do not see any other way to solve e.g. the problem of playing nicely with cluster/cloud computation (where multiple tasks run in parallel, and the number of threads needs to be limited per user not per algorithm).
@dscho If you provide a solution to this issue without adding a new dependency to JTransforms, then I will accept your pull request.
@dietzc and I are reviewing all the issues. What is left to do to close out this issue? We have to:
FFTConvolution
from imagej-ops
in favor of a new JTransforms-based FFTConvolution
which will exist in imglib2-algorithm-fft
which is BSD-2 licensed.ConvolveFourier
to use the new FFTConvolution
from imglib2-algorithm-fft
.Is that right?
Alternately, the whole imglib2-algorithm-fft
project could just merge to imagej-ops
. What do others think?
Hi @ctrueden
ConvolveFourier is no longer used, the new convolve is called ConvolveFFTImg. FFTConvolution is not used anymore either.
You should be able to just delete the following two files
Delete FFTConvolution from imagej-ops Delete ConvolveFourier from imagej-ops
We should probably still get permission from @StephanPreibisch to relicense. There is a class https://github.com/imagej/imagej-ops/blob/master/src/main/java/net/imagej/ops/fft/filter/CreateFFTFilterMemory.java that reused a section of his code for the FFT memory handling.
The issue with imglib2-algorithm-fft was different as the problem was the jdk-mines license.
Thanks @bnorthan! Maybe you could discuss the licensing along with https://github.com/imglib/imglib2-algorithm-fft/pull/3 when you meet with @StephanPreibisch in person.
The FFTConvolution class was copied from ImgLib2 wholesale. This class was taken from
imglib2-algorithms-gpl
, which is licensed under the GPL. Hence, this class is incompatible with ImageJ OPS's BSD-2 licensing.To address this issue in the short to medium term, we can split the
imagej.ops.convolve
package into its own component,ij-ops-gpl
, which depends onimglib2-algorithms-gpl
. But ideally, we don't want any part of OPS to be licensed under the GPL; better would be to find a way to do convolution without that code.See also https://github.com/imglib/imglib/issues/61