imagej / imagej-ops

ImageJ Ops: "Write once, run anywhere" image processing
https://imagej.net/libs/imagej-ops
BSD 2-Clause "Simplified" License
88 stars 42 forks source link

Remove FFTConvolution class #12

Closed ctrueden closed 9 years ago

ctrueden commented 10 years ago

The FFTConvolution class was copied from ImgLib2 wholesale. This class was taken from imglib2-algorithms-gpl, which is licensed under the GPL. Hence, this class is incompatible with ImageJ OPS's BSD-2 licensing.

To address this issue in the short to medium term, we can split the imagej.ops.convolve package into its own component, ij-ops-gpl, which depends on imglib2-algorithms-gpl. But ideally, we don't want any part of OPS to be licensed under the GPL; better would be to find a way to do convolution without that code.

See also https://github.com/imglib/imglib/issues/61

ctrueden commented 10 years ago

Pushed from 0.1.0 to 0.2.0, because this isn't really a blocker for the initial release announcement. The announcement just needs to warn that the development is still very active and subject to restructuring within the next weeks.

bnorthan commented 9 years ago

Hi Curtis

My understanding, after talking to you about it at the hackathon, is that there are 2 parts to this.

  1. FFTConvolution is GPL. I think we would want a slightly different class hierarchy anyway. Convolution could be built from the fft op. High level convolution is not to hard to write (ffts and a complex multiply).
  2. Mines FFT is CPL. This can be solved by using a different 1d FFT routine in the imglib2 FFTMethods class. ND-FFT is implemented by applying Mines FFT line by line. There are only a few calls to Mines that would have to be replaced.

I looked at JTransforms and the only thing I couldn't figure out is how to get the efficient sizes. FFTs work fastest at certain sizes (power of 2 is best, but other sizes work also) and you need to know these sizes to set up the padding correctly. I could not figure out how to do this with the JTransforms API.

@StephanPreibisch do you have any plans for this??

dscho commented 9 years ago

I looked at JTransforms and the only thing I couldn't figure out is how to get the efficient sizes.

AFAICT JTransforms only handles powers of two really efficiently, everything else is handled by the exact same code:

https://github.com/wendykierp/JTransforms/blob/ede249c9824262bd9b5f571e56f7a0fa596f6f20/src/main/java/org/jtransforms/fft/DoubleFFT_2D.java#L117

There is another issue we might have to hash out with @wendykierp at some stage: due to the static way nthreads is determined the code is not cluster-safe (where you would need to specify on a per-Context level how many threads are permissible, as opposed to setting it globally).

wendykierp commented 9 years ago

There are three FFT algorithms implemented in JTransforms: split-radix (only for powers of two), mixed-radix (2-, 3-, 4-, 5-, n-) and Bluestein (used when the reminder from the integer division of input size by 2, 3, 4 and 5 is greater or equal 211):

https://github.com/wendykierp/JTransforms/blob/ede249c9824262bd9b5f571e56f7a0fa596f6f20/src/main/java/org/jtransforms/fft/DoubleFFT_1D.java#L117

bnorthan commented 9 years ago

Thanks @dscho and @wendykierp. So then can we use this approach http://ltfat.sourceforge.net/notes/ltfatnote017.pdf to find the efficient sizes in between powers of 2??

dscho commented 9 years ago

@bnorthan interesting, but I think a table lookup with 2^20 entries is overkill.

Given a number N and a set of prime numbers p_i you should be able to set up a linear programming algorithm that minimizes \sum_i (a_i \cdot\log p_i) - \log N under the constraint that it has to be non-negative.

bnorthan commented 9 years ago

I was thinking of a smaller lookup table... but that equation looks like it would work as well.

dscho commented 9 years ago

@bnorthan see also 5.7 of http://www.math.leidenuniv.nl/~psh/ANTproc/09andrew.pdf

dscho commented 9 years ago

@bnorthan I ran out of time, but I did implement something you might find useful:

package org.scijava.math;

import java.util.Arrays;

/**
 * A class to determine the next <i>k</i>-smooth number (i.e. a number divisible
 * only by prime numbers up to <i>k</i>), given a lower bound <i>n</i>. Based on
 * A. Granville, <i>Smooth numbers: computational number theory and beyond</i>:
 * <blockquote> 5.7. <b>Finding smooth numbers computationally.</b> The obvious
 * way to find y-smooth numbers in (x; x + z) with z &le; x is to initialize an
 * array a[i] := 0 for 1 &le; i &le; z (where a[i] corresponds to x + i). For
 * each successively larger prime power p<sup>j</sup> &le; x + z with p &le; y,
 * determine the smallest i such that p<sup>j</sup> divides x + i and then add
 * log(p) to a[i], a[i + p<sup>j</sup>], a[i + 2p<sup>j</sup>] and so on, up
 * until the end of the array. When we’ve finished, if any a[i] &ge; log(x),
 * then x + i is y-smooth. </blockquote>
 * 
 * @author Johannes Schindelin
 */
public class NextSmoothNumber {
    public static int nextSmooth(final int y, final int x, final int z) {
        final double[] a = new double[z];
        handlePrime(2, x, a);
        handlePrime(3, x, a);
        handlePrime(5, x, a);
        handlePrime(7, x, a);
        double log = Math.log(x);
        for (int i = 0; i < a.length; i++) {
            if (a[i] >= log) return x + i;
        }
        System.err.println(Arrays.toString(a));
        System.err.println(a[32805 - x]);
        return -1;
    }

    private static void handlePrime(final int p, final int x, final double[] a) {
        double log = Math.log(p);
        for (int power = p; power <= x + a.length; power *= p) {
            int j = x % power;
            if (j > 0) j = power - j;
            while (j < a.length) {
                a[j] += log;
                j += power;
            }
        }
    }

    private static int log2(int x) {
        for (int j = 0, k = 1; ; j++, k *= 2) {
            if (k >= x) return j;
        }
    }

    public static void main(String... args) {
        int x = args.length == 0 ? 32769 : Integer.parseInt(args[0]);
        System.err.println(log2(x));
        System.out.println(nextSmooth(7, x, 4 * log2(x)));
    }
}

What I wanted to do is to run a loop from 32000 to 33000 and verify that the produced numbers are 7-smooth...

bnorthan commented 9 years ago

Nice. I'll take a look at it and see if I can finish off the test.

ctrueden commented 9 years ago

Thanks @bnorthan! I reassigned this issue to you, and moved it to the "medium" milestone.

bnorthan commented 9 years ago

@dscho it worked great. The only problem I found is that the heuristic for the search size (4*log2(x)) was too small sometimes. For example if x=32000 it doesn't find any smooth numbers within the search range. So I wrote a little wrapper function that just searches ahead 50 at a time. I think that makes sense and works but let me know if you have any other ideas.

package org.scijava.math;

import java.util.Arrays;

/**
 * A class to determine the next <i>k</i>-smooth number (i.e. a number divisible
 * only by prime numbers up to <i>k</i>), given a lower bound <i>n</i>. Based on
 * A. Granville, <i>Smooth numbers: computational number theory and beyond</i>:
 * <blockquote> 5.7. <b>Finding smooth numbers computationally.</b> The obvious
 * way to find y-smooth numbers in (x; x + z) with z &le; x is to initialize an
 * array a[i] := 0 for 1 &le; i &le; z (where a[i] corresponds to x + i). For
 * each successively larger prime power p<sup>j</sup> &le; x + z with p &le; y,
 * determine the smallest i such that p<sup>j</sup> divides x + i and then add
 * log(p) to a[i], a[i + p<sup>j</sup>], a[i + 2p<sup>j</sup>] and so on, up
 * until the end of the array. When we’ve finished, if any a[i] &ge; log(x),
 * then x + i is y-smooth. </blockquote>
 * 
 * @author Johannes Schindelin
 */
public class NextSmoothNumber {

    public static int nextSmooth(int x) {
        int result=-1;
        int z=50;
        while (result==-1) {
            result=nextSmooth(7, x, z);
            x=x+z;
        }
        return result;
    }

    public static int nextSmooth(final int y, final int x, final int z) {
        final double[] a = new double[z];
        handlePrime(2, x, a);
        handlePrime(3, x, a);
        handlePrime(5, x, a);
        handlePrime(7, x, a);
        double log = Math.log(x);
        for (int i = 0; i < a.length; i++) {
            if (a[i] >= log) return x + i;
        }
        //System.err.println(Arrays.toString(a));
        //System.err.println(a[32805 - x]);
        return -1;
    }

    private static void handlePrime(final int p, final int x, final double[] a) {
        double log = Math.log(p);
        for (int power = p; power <= x + a.length; power *= p) {
            int j = x % power;
            if (j > 0) j = power - j;
            while (j < a.length) {
                a[j] += log;
                j += power;
            }
        }
    }

    private static int log2(int x) {
        for (int j = 0, k = 1; ; j++, k *= 2) {
            if (k >= x) return j;
        }
    }

    public static void main(String... args) {
        int x = args.length == 0 ? 32769 : Integer.parseInt(args[0]);
        System.err.println(log2(x));

        int result=nextSmooth(7, x, 4 * log2(x));

        System.out.println(result);
    }
}
bnorthan commented 9 years ago

And the test is below (the prime factor routine is from http://www.vogella.com/tutorials/JavaAlgorithmsPrimeFactorization/article.html)

package org.scijava.math;

import java.util.Arrays;
import java.util.List;
import java.util.ArrayList;

public class TestSmoothNumber {
     public static void main(String... args) {
        int start = args.length <= 1 ? 32000 : Integer.parseInt(args[0]);
        int finish = args.length <= 1 ? 33000 : Integer.parseInt(args[1]);

        int numFailures=0;

        for (int i=start;i<=finish;i++) {

            int result=NextSmoothNumber.nextSmooth(i);

            System.out.println(i+": "+result);
            if (result==-1) {
                numFailures++;
            }
            else {
                List<Integer> factors=primeFactors(result);

                if (factors.get(factors.size()-1)>7) {
                    numFailures++;
                }
            }
        }

        System.out.println("Number Failures: "+numFailures);
     }

     private static List<Integer> primeFactors(int number) {
        int n = number;
        List<Integer> factors = new ArrayList<Integer>();
        for (int i = 2; i <= n; i++) {
            while (n % i == 0) {
                factors.add(i);
                n /= i;
            }
        }
        return factors;
    }
}
dscho commented 9 years ago

Great job! As for the test, I could imagine that the following is faster and simpler than full factorization:

// expects a positive integet
public static boolean is7Smooth(int x) {
    for (int i = 7; i > 1; i--) {
       while ((x % i) == 0) {
            x /= i;
        }
    }
    return x == 1;
}

I believe that the improvement in complexity (linear runtime in the bitwidth of the input instead of exponential, constant memory instead of exponential) makes up for the inefficiency of dividing by non-primes 4 and 6...

dscho commented 9 years ago

So I wrote a little wrapper function that just searches ahead 50 at a time.

Great! I'd use a for (;;) { int result = ...; if (result > 0) return result } loop to avoid an unnecessary assignment, but this method is probably not mission critical enough to optimize for speed.

dscho commented 9 years ago

Any progress on this one?

bnorthan commented 9 years ago

I forked imglib and am testing things out on a branch (work in progress - still needs testing).

https://github.com/bnorthan/imglib/tree/fft_JTransform

The key file is FFTMethods.java. I have a FFTMethodsJTransform.java which is basically the same except uses JTransforms for the low level 1D transforms and NextSmoothNumber.java to calculate the fast size.

Do you want to keep a mines version?? If so FFTMethods.java might have to be refactored a bit to handle both APIs, otherwise FFTMethodsJTransform.java could just replace FFTMethods.

dscho commented 9 years ago

That's great! IMHO we do not need to keep the mines version; @StephanPreibisch reported that he chose it because at the time it was the fastest FFT in Java. This information is a couple of years old, though, I do not have newer information.

bnorthan commented 9 years ago

Still WIP. But I've added some tests.

algorithms/core/src/test/java/net/imglib2/algorithm/fft2/testFFT.java.

I'll have to do some reading and make sure the accuracy makes sense.

http://www.fftw.org/accuracy/comments.html

bnorthan commented 9 years ago

@dscho Things are coming along. I am updating the fftmethods_wrapper branch every couple of days.

There is another issue we might have to hash out with @wendykierp at some stage: due to the static way nthreads is determined the code is not cluster-safe (where you would need to specify on a per-Context level how many threads are permissible, as opposed to setting it globally).

A related issue is that when performing a multi dimensional fft, the FFTMethods class calls n 1D ffts in parallel (ie it processes several lines in parallel) where n is the number of processors. The assumption seems to be that the 1D fft is not threaded. It would be nice to be able to control the number of threads used for the 1D fft. Then we could optimize speed by tuning the number of lines processed in parallel and the number of threads per line.

That being said initial tests indicate that fftmethods wrapping mines and fftmethods wrapping jtransform run at similar speed for ND-fft (even though I haven't tried to optimize the number of threads).

StephanPreibisch commented 9 years ago

Hi @bnorthan, I am really looking forward to that, it is great that you put so much effort into this! This will make our lives much easier.

wendykierp commented 9 years ago

ConcurrenctUtils.setThreadsBeginN_1D_FFT_2Threads(n) and ConcurrenctUtils.setThreadsBeginN_1D_FFT_4Threads(n) can be used to disable threads in 1D fft. Just set them to Long.MAX_VALUE. ConcurrencyUtils.setNumberOfThreads(n) can be called at any time. Here is an example:

        int size = 1<<14;
        FloatFFT_2D fft = new FloatFFT_2D(size,size);
        float[] x = new float[2*size*size];
        IOUtils.fillMatrix_2D(size, 2*size, x);
        ConcurrencyUtils.setNumberOfThreads(1);
        long t = System.nanoTime();
        fft.complexForward(x);
        System.out.println("Single thread time = " + (System.nanoTime() - t) / 1e9 + " sec.");
        ConcurrencyUtils.setNumberOfThreads(16);
        IOUtils.fillMatrix_2D(size, 2*size, x);
        t = System.nanoTime();
        fft.complexForward(x);
        System.out.println("Sixteen threads time = " + (System.nanoTime() - t) / 1e9 + " sec."); 

And the output:

Single thread time = 18.785440844 sec.
Sixteen threads time = 2.256298538 sec.
dscho commented 9 years ago

@wendykierp The problem is that this is global, yet unaware how much else is going on. For the record: SciJava commons sports a Context and a ThreadService for that purpose (the ThreadService is an interface and can be overridden by an application-specific implementation). I do not see any other way to solve e.g. the problem of playing nicely with cluster/cloud computation (where multiple tasks run in parallel, and the number of threads needs to be limited per user not per algorithm).

wendykierp commented 9 years ago

@dscho If you provide a solution to this issue without adding a new dependency to JTransforms, then I will accept your pull request.

ctrueden commented 9 years ago

@dietzc and I are reviewing all the issues. What is left to do to close out this issue? We have to:

Is that right?

Alternately, the whole imglib2-algorithm-fft project could just merge to imagej-ops. What do others think?

bnorthan commented 9 years ago

Hi @ctrueden

ConvolveFourier is no longer used, the new convolve is called ConvolveFFTImg. FFTConvolution is not used anymore either.

You should be able to just delete the following two files

Delete FFTConvolution from imagej-ops Delete ConvolveFourier from imagej-ops

We should probably still get permission from @StephanPreibisch to relicense. There is a class https://github.com/imagej/imagej-ops/blob/master/src/main/java/net/imagej/ops/fft/filter/CreateFFTFilterMemory.java that reused a section of his code for the FFT memory handling.

The issue with imglib2-algorithm-fft was different as the problem was the jdk-mines license.

ctrueden commented 9 years ago

Thanks @bnorthan! Maybe you could discuss the licensing along with https://github.com/imglib/imglib2-algorithm-fft/pull/3 when you meet with @StephanPreibisch in person.