imglib / imglib2-roi

Regions of interest (ROIs) and labelings for ImgLib2
Other
8 stars 8 forks source link

RealMaskRealInterval OR and AND hang when combining too many masks #60

Closed tischi closed 1 year ago

tischi commented 2 years ago

@ctrueden @tpietzsch

This code starts to hang for me around iteration 10:

        final RealMaskRealInterval mask = GeomMasks.closedBox( new double[]{ 0, 0, 0 }, new double[]{ 1, 1, 1 } );
        RealMaskRealInterval test = mask;
        for ( int i = 0; i < 20; i++ )
        {
            test = test.or( mask );
            System.out.println( i );
            System.out.println( test.realMin( 0 ));
        }

Am I doing something wrong or is it an issue with the code?

(I know that this specific example does not make sense as I am always combining the same mask, but it also hangs for me when doing it with a similar amount of different masks).

ctrueden commented 2 years ago

What is the code doing while the hang is happening? https://imagej.net/learn/troubleshooting#if-imagej-freezes-or-hangs

In an IDE you can simply hit the pause button and inspect the stack trace. Or use the IDE or VisualVM to profile the code to find out where it's spending its time.

tischi commented 2 years ago

I am running this in IntelliJ...

VisualVM gives this:

image

Can I do more to help the debugging?

ctrueden commented 2 years ago

Hmm, it's a lot of recursion, yeah. But I'm still surprised; it's not exponential growth, but rather a linear increase.

I cannot spend time digging on this right now. Main thought off the top of my head is: can we introduce some memoization to avoid repeated recursion on unchanging intervals? The realMin, realMax functions of UnionRealInterval would be good candidates for that. Here is a quick patch (untested) for you to try:

diff --git a/src/main/java/net/imglib2/roi/Bounds.java b/src/main/java/net/imglib2/roi/Bounds.java
index 3f9e1ab..7c5145b 100644
--- a/src/main/java/net/imglib2/roi/Bounds.java
+++ b/src/main/java/net/imglib2/roi/Bounds.java
@@ -535,40 +535,60 @@ public abstract class Bounds< I extends RealInterval, B extends Bounds< I, B > >

        private final RealInterval i2;

+       private final double[] min;
+
+       private final double[] max;
+
        public UnionRealInterval( final RealInterval i1, final RealInterval i2 )
        {
            super( i1.numDimensions() );
            this.i1 = i1;
            this.i2 = i2;
            assert ( i1.numDimensions() == i2.numDimensions() );
+           min = new double[ i1.numDimensions() ];
+           max = new double[ i1.numDimensions() ];
+           Arrays.fill( min, Double.NaN );
+           Arrays.fill( max, Double.NaN );
        }

        @Override
        public double realMin( final int d )
        {
-           if ( Intervals.isEmpty( i1 ) )
+           if ( Double.isNaN( min[ d ] ) )
            {
-               if ( Intervals.isEmpty( i2 ) )
-                   return Double.POSITIVE_INFINITY;
-               return i2.realMin( d );
+               if ( Intervals.isEmpty( i1 ) )
+               {
+                   if ( Intervals.isEmpty( i2 ) )
+                       min[ d ] = Double.POSITIVE_INFINITY;
+                   else
+                       min[ d ] = i2.realMin( d );
+               }
+               else if ( Intervals.isEmpty( i2 ) )
+                   min[ d] = i1.realMin( d );
+               else
+                   min[ d ] = Math.min( i1.realMin( d ), i2.realMin( d ) );
            }
-           if ( Intervals.isEmpty( i2 ) )
-               return i1.realMin( d );
-           return Math.min( i1.realMin( d ), i2.realMin( d ) );
+           return min[ d ];
        }

        @Override
        public double realMax( final int d )
        {
-           if ( Intervals.isEmpty( i1 ) )
+           if ( Double.isNaN( max[d] )
            {
-               if ( Intervals.isEmpty( i2 ) )
-                   return Double.NEGATIVE_INFINITY;
-               return i2.realMax( d );
+               if ( Intervals.isEmpty( i1 ) )
+               {
+                   if ( Intervals.isEmpty( i2 ) )
+                       max[ d ] = Double.NEGATIVE_INFINITY;
+                   else
+                       max[ d ] = i2.realMax( d );
+               }
+               else if ( Intervals.isEmpty( i2 ) )
+                   max[ d ] = i1.realMax( d );
+               else
+                   max[ d ] = Math.max( i1.realMax( d ), i2.realMax( d ) );
            }
-           if ( Intervals.isEmpty( i2 ) )
-               return i1.realMax( d );
-           return Math.max( i1.realMax( d ), i2.realMax( d ) );
+           return max[ d ];
        }
    }
tischi commented 2 years ago

Thanks! I will try to find time to test it!

constantinpape commented 2 years ago

Here's a link for how to apply the patch, @tischi: https://stackoverflow.com/questions/2249852/how-to-apply-a-patch-generated-with-git-format-patch (Sent to you via mattermost earlier, but not sure if you have seen it in between the other messages.)

tischi commented 2 years ago

@ctrueden

Some new finding: it also hangs when combining the masks with and, and your path does not help in this case.

If you have any pointers for me where the look for the culprit, let me know.

tischi commented 2 years ago

I may have found something (sorry for potential spamming, using this to keep notes in case I cannot fully fix it).

This works fine (when commenting out the isEmpty calls).

        public IntersectionRealInterval( final RealInterval i1, final RealInterval i2 )
        {
            super( i1.numDimensions() );
            this.i1 = i1;
            this.i2 = i2;
            assert ( i1.numDimensions() == i2.numDimensions() );
        }

        @Override
        public double realMin( final int d )
        {
            //if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
            //  return Double.POSITIVE_INFINITY;
            return Math.max( i1.realMin( d ), i2.realMin( d ) );
        }

        @Override
        public double realMax( final int d )
        {
            //if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
            //  return Double.NEGATIVE_INFINITY;
            return Math.min( i1.realMax( d ), i2.realMax( d ) );
        }
tischi commented 2 years ago

This is fast as well, indicating that the recursion is in i1.

        @Override
        public double realMin( final int d )
        {
            //if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
            if ( Intervals.isEmpty( i2 ) )
                    return Double.POSITIVE_INFINITY;
            return Math.max( i1.realMin( d ), i2.realMin( d ) );
        }

        @Override
        public double realMax( final int d )
        {
            //if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
            if ( Intervals.isEmpty( i2 ) )
                return Double.NEGATIVE_INFINITY;
            return Math.min( i1.realMax( d ), i2.realMax( d ) );
        }
tischi commented 2 years ago

I still don't get why there are sooo many recursions, but this is not very efficient:

if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
    return Double.POSITIVE_INFINITY;
return Math.max( i1.realMin( d ), i2.realMin( d ) );

Because Intervals.isEmpty( i1 ) internally will call i1.realMin( d ), which is then called again in return Math.max( i1.realMin( d ), i2.realMin( d ) ); (the same holds true for i2). Thus, one could add explicit code for the Intervals.isEmpty( ... ) calls and then keep the i1.realMin( d ), i2.realMin( d ) values for computing the maximum.

tischi commented 2 years ago

Another place for optimization is that Intervals.isEmpty() contains this line of code: if ( interval.realMin( d ) > interval.realMax( d ) ) which will cause interval.isEmpty() to be evaluated twice (both within realMin() and realMax()).

tischi commented 2 years ago

Because Intervals.isEmpty( i1 ) internally will call i1.realMin( d ), which is then called again in return Math.max( i1.realMin( d ), i2.realMin( d ) ); (the same holds true for i2). Thus, one could add explicit code for the Intervals.isEmpty( ... ) calls and then keep the i1.realMin( d ), i2.realMin( d ) values for computing the maximum.

Adding the mentioned explicit code seems to help a bit but only gives a factor of 2 or so in speed up.

Right now, it feels to me that some (if only short-term, if that is possible) caching of at least whether the contained Intervals i1 and i2 isEmpty might be needed, but I am not sure.

tischi commented 2 years ago

Some new finding: it also hangs when combining the masks with "and", and your patch does not help in this case.

This is of course the case because there are UnionRealInterval and IntersectionRealInterval (for and and or), which both suffer independently from the same issue.

Another place for optimization is that Intervals.isEmpty() contains this line of code: if ( interval.realMin( d ) > interval.realMax( d ) ) which will cause interval.isEmpty() to be evaluated twice (both within realMin() and realMax()).

Maybe this is the reason for the explosion? Maybe that this leads to a 2 2 2 ... type of behavior?

In other words just to get one, e.g., realMin( 0 ) it currently accesses both realMin and realMax for all dimensions (for checking for emptiness), which in turn do the same again. That means realMin( d ) will cause (for a 3-D interval), 6 calls, where each of them again causes 6 calls; so maybe it is rather 6 6 6 ...?

Would it be possible to change the code (e.g., for interval instanceOf AbstractAdaptingRealInterval) such that there would be a realMinMax( double[][] ) which would get all dimensions in one go with checking only once for emptiness?

Or implement a special isEmpty() method...

tischi commented 2 years ago

@ctrueden @tpietzsch

I think I may have found a solution 🥳

All test run and it is fast ( only 8 ms even for a 1000 combined AND intervals ) .

    /**
     * The intersection of two intervals. Adapts to changes of the source
     * intervals.
     */
    public static class IntersectionRealInterval extends AbstractAdaptingRealInterval
    {
        private final RealInterval i1;

        private final RealInterval i2;

        public IntersectionRealInterval( final RealInterval i1, final RealInterval i2 )
        {
            super( i1.numDimensions() );
            this.i1 = i1;
            this.i2 = i2;
            assert ( i1.numDimensions() == i2.numDimensions() );
        }

        @Override
        public double realMin( final int d )
        {
            if ( isEmpty() )
                return Double.POSITIVE_INFINITY;
            return Math.max( i1.realMin( d ), i2.realMin( d )  );
        }

        @Override
        public double realMax( final int d )
        {
            if ( isEmpty() )
                return Double.NEGATIVE_INFINITY;
            return Math.min( i1.realMax( d ), i2.realMax( d )  );
        }

                // Efficiently test emptiness, avoiding insane recursions            
            public boolean isEmpty()
        {
            return isEmpty( i1 ) || isEmpty( i2 );
        }

                // Efficiently test emptiness, avoiding insane recursions
                // TODO: move somewhere "higher up" and add a case for UnionRealInterval
        public static boolean isEmpty( RealInterval interval )
        {
            if ( interval instanceof AbstractWrappedRealInterval )
            {
                final RealInterval source = ( ( AbstractWrappedRealInterval ) interval ).getSource();
                if ( source instanceof IntersectionRealInterval )
                    return ( ( IntersectionRealInterval ) source ).isEmpty();
            }

            return Intervals.isEmpty( interval );
        }
    }

What do you think?

If you approve, I will continue to work on a PR tomorrow (getting late here).

tpietzsch commented 2 years ago
 // Efficiently test emptiness, avoiding insane recursions            
public boolean isEmpty()
{
    return isEmpty( i1 ) || isEmpty( i2 );
}

This is most likely wrong. The condition is sufficient but not necessary: the intersection of i1 and i2 may still be empty if neither i1 nor i2 are empty.

tischi commented 2 years ago

@tpietzsch In fact, I agree, I had the same thought, but in the original code it looked like this:

@Override
public double realMax( final int d )
{
    if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
        return Double.NEGATIVE_INFINITY;
    return Math.min( i1.realMax( d ), i2.realMax( d )  );
}

The point is to replace the Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) by something computationally more efficient. I managed to do this with the code above.

Do you think that

a) already the original code is wrong, or b) if I rename the new isEmpty() method to something more appropriate would be fine?

tischi commented 2 years ago

My feeling would be that it is wrong in the original code.

Maybe we should replace:

Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 )

with

Intervals.isEmpty( Intervals.intersect( i1, i2 ) )

?

tischi commented 2 years ago

I am confused.

In OperatorsTest.testAndMovingOperands() there are these two lines of code directly below each other:

        assertTrue( rm.isEmpty() );
        assertEquals( rm.realMax( 0 ), 13.5, 0 );

To me this indicates that we apparently do not want rm.realMax( 0 ) to return Double.NEGATIVE_INFINITY if the interval is empty.

I don't know what would be correct here, to continue I would need some help/explanation about what should happen here and why.

tischi commented 2 years ago

The behavior with just standard imglib2 is in fact such that it does return some finite value for the interval bounds even if it is empty:

        final FinalInterval i01 = FinalInterval.createMinMax( 0, 1 );
        final FinalInterval i23 = FinalInterval.createMinMax( 2, 3 );
        final FinalInterval empty = Intervals.intersect( i01, i23 );
        final boolean isEmpty = Intervals.isEmpty( empty );
        final double realMin = empty.realMin( 0 );
        System.out.println( isEmpty ); // yields: true
        System.out.println( realMin ); // yields: 2

Then I would suggest to proceed with my refactoring approach, but reconsider the name of the new isEmpty() method that I introduced.

tischi commented 2 years ago

No, confused again... ;-)

I don't get why we do these isEmpty checks in IntersectionRealInterval at all:

@Override
public double realMax( final int d )
{
    if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
        return Double.NEGATIVE_INFINITY;
    return Math.min( i1.realMax( d ), i2.realMax( d )  );
}

In fact, If I just comment it out all test pass... 😕

@Override
public double realMax( final int d )
{
    //if ( Intervals.isEmpty( i1 ) || Intervals.isEmpty( i2 ) )
    //  return Double.NEGATIVE_INFINITY;
    return Math.min( i1.realMax( d ), i2.realMax( d )  );
}

So, maybe it should just be removed? It would solve the recursion issue 😄

@ctrueden any opinions?

tischi commented 2 years ago

Could IntersectionRealInterval be considered a lazy version of Intervals.intersect?

If yes, then I think they should both behave similar and the currently existing isEmpty checks of the intersecting intervals within IntersectionRealInterval should be removed, because they do not exist in Intervals.intersect.

The same argumentation applies to UnionRealInterval and Intervals.union.

tpietzsch commented 2 years ago

Could IntersectionRealInterval be considered a lazy version of Intervals.intersect?

If yes, then I think they should both behave similar and the currently existing isEmpty checks of the intersecting intervals within IntersectionRealInterval should be removed, because they do not exist in Intervals.intersect.

I think: Yes to both...

The same argumentation applies to UnionRealInterval and Intervals.union.

No. Please see discussion here: https://github.com/imglib/imglib2/issues/300

Intervals.union is broken for empty intervals. (Broken with respect to the expectation that union of two empty intervals should be empty.) In this case, I think the Intervals.union should be fixed, not the UnionRealInterval made also broken.

bogovicj commented 2 years ago

A fix for Intervals.union is now here https://github.com/imglib/imglib2/pull/301

tpietzsch commented 2 years ago

I merged https://github.com/imglib/imglib2/pull/301. See https://github.com/imglib/imglib2/issues/315 for follow-up discussion.

tpietzsch commented 2 years ago

Some new finding: it also hangs when combining the masks with "and", and your patch does not help in this case.

This is of course the case because there are UnionRealInterval and IntersectionRealInterval (for and and or), which both suffer independently from the same issue.

Another place for optimization is that Intervals.isEmpty() contains this line of code: if ( interval.realMin( d ) > interval.realMax( d ) ) which will cause interval.isEmpty() to be evaluated twice (both within realMin() and realMax()).

Maybe this is the reason for the explosion? Maybe that this leads to a 2 2 2 ... type of behavior?

Yes, that's the problem exactly.

Would it be possible to change the code (e.g., for interval instanceOf AbstractAdaptingRealInterval) such that there would be a realMinMax( double[][] ) which would get all dimensions in one go with checking only once for emptiness?

I think this is the right idea. I prototyped it here, and it fixes the performance issue. (union is now as fast as intersection in a similar test), https://github.com/imglib/imglib2-roi/commit/56bab0396fcdc7a2174843dc282fb96368b244ef

What needs to be done:

If somebody wants to work making this into a PR that would be great!

tischi commented 2 years ago

Just double checking: should we still do all of this even if we would have a isIntervalEmpty() method as discussed in https://github.com/imglib/imglib2/issues/315?

If so, I could try to give this a shot next week.

tischi commented 2 years ago

@tpietzsch I started working on this here: https://github.com/imglib/imglib2-roi/pull/63 I will ping you once it's ready.

tpietzsch commented 2 years ago

Just double checking: should we still do all of this even if we would have a isIntervalEmpty() method as discussed in imglib/imglib2#315?

Yes, I think isIntervalEmpty() alone would not solve it.

imagejan commented 1 year ago

Can this be closed, now that #63 is merged?