sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.41k stars 475 forks source link

list_plot3d.py should not generate NaN coordinates #13135

Closed jdemeyer closed 6 years ago

jdemeyer commented 12 years ago

Both #12798 and the matplotlib upgrade (#23696) cause this:

sage -t --long --warn-long 10.0 src/sage/plot/plot3d/list_plot3d.py
**********************************************************************
File "src/sage/plot/plot3d/list_plot3d.py", line 143, in sage.plot.plot3d.list_plot3d.list_plot3d
Warning, slow doctest:
    list_plot3d(l, interpolation_type='clough', texture='yellow', num_points=100)
Test ran for 926.47 s
**********************************************************************

on hardware which is slow in computing with NaN values.

This plot is done by tachyon but in the input to tachyon, there are many NaNs. This can be seen by applying attachment: nancount.patch. The result is

sage -t --long src/sage/plot/plot3d/list_plot3d.py
**********************************************************************
File "src/sage/plot/plot3d/list_plot3d.py", line 143, in sage.plot.plot3d.list_plot3d.list_plot3d
Failed example:
    list_plot3d(l, interpolation_type='clough', texture='yellow', num_points=100)
Expected:
    Graphics3d Object
Got:
    Got 3206 NaN values
    Graphics3d Object
**********************************************************************
File "src/sage/plot/plot3d/list_plot3d.py", line 387, in sage.plot.plot3d.list_plot3d.list_plot3d_tuples
Failed example:
    show(list_plot3d([[1, 1, 1], [1, 2, 1], [0, 1, 3], [1, 0, 4]], point_list=True))
Expected nothing
Got:
    Got 31 NaN values
**********************************************************************
File "src/sage/plot/plot3d/list_plot3d.py", line 391, in sage.plot.plot3d.list_plot3d.list_plot3d_tuples
Failed example:
    list_plot3d([(1, 2, 3), (0, 1, 3), (2, 1, 4), (1, 0, -2)], texture='yellow', num_points=50)
Expected:
    Graphics3d Object
Got:
    Got 1273 NaN values
    Graphics3d Object
**********************************************************************

CC: @strogdon @kiwifb

Component: graphics

Author: Jeroen Demeyer

Branch/Commit: 4d17a73

Reviewer: Steven Trogdon

Issue created by migration from https://trac.sagemath.org/ticket/13135

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-This used to work on Solaris SPARC in sage-5.1.beta3, but no longer in sage-5.1.beta5:
+This used to work on Solaris SPARC in sage-5.1.beta3, but no longer in sage-5.1.beta4:

sage -t --long -force_lib devel/sage/sage/plot/plot3d/list_plot3d.py

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -6,3 +6,11 @@

          [5001.6 s]

+ +More precisely, this takes forever: + +``` +Trying:

ppurka commented 12 years ago
comment:4

This takes 0.18s on my laptop in sage-5.1.beta5. I don't see why it should take >5s on some other hardware unless the hardware is really that slow.

sage: l=[]
sage: for i in range(-5,5):
          for j in range(-5,5):
              l.append((normalvariate(0,1),normalvariate(0,1),normalvariate(0,1)))

sage: %time list_plot3d(l,interpolation_type='nn',texture='yellow',num_points=100)
CPU times: user 0.16 s, sys: 0.01 s, total: 0.18 s
Wall time: 0.18 s
jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -7,10 +7,10 @@
          [5001.6 s]

-More precisely, this takes forever: +More precisely, this takes a very long time:

 Trying:
-    list_plot3d(l,interpolation_type='nn',texture='yellow',num_points=Integer(100))###line 159:_sage_    >>> list_plot3d(l,interpolation_type='nn',texture='yellow',num_points=100)
+    list_plot3d([(Integer(0),Integer(0),Integer(1)), (Integer(2),Integer(3),Integer(4)), (-Integer(1),-Integer(1),-Integer(1))],num_points=Integer(400)) # long time###line 171:_sage_    >>> list_plot3d([(0,0,1), (2,3,4), (-1,-1,-1)],num_points=400) # long time
 Expecting nothing
jdemeyer commented 12 years ago
comment:6

Replying to @ppurka:

This takes 0.18s on my laptop in sage-5.1.beta5. I don't see why it should take >5s on some other hardware unless the hardware is really that slow.

Do an explaination of why it could have slowed down a lot because of this ticket?

ppurka commented 12 years ago
comment:7

Replying to @jdemeyer:

Replying to @ppurka:

This takes 0.18s on my laptop in sage-5.1.beta5. I don't see why it should take >5s on some other hardware unless the hardware is really that slow.

Do an explaination of why it could have slowed down a lot because of this ticket?

I think the way to handle the slow down is to reduce the num_points to 100.

It is not really a "slow down" per se, but it actually gives an accurate plot. Before #13135 3d plots of lists of points were just plain wrong.

I think the following is a good means of checking the time that the plot in the revised description takes

sage: %time list_plot3d([(0,0,1), (2,3,4), (-1,-1,-1)],num_points=400).export_jmol('/tmp/a.jmol')            
CPU times: user 5.37 s, sys: 0.05 s, total: 5.42 s
Wall time: 5.44 s
sage: %time list_plot3d([(0,0,1), (2,3,4), (-1,-1,-1)],num_points=100).export_jmol('/tmp/a.jmol')
CPU times: user 0.35 s, sys: 0.00 s, total: 0.35 s
Wall time: 0.35 s

Can you run this command on the Solaris machine and find out what is the time taken? If it takes much less time on the latter command, then it would be good to have the latter one (with 100 points) instead of the former one (with 400 points).

jdemeyer commented 12 years ago
comment:8

Here's an other weird thing: the test takes much longer when running the doctests than when run individually. Inside Sage, there is not much difference with/without the #12798 patches. But when running the doctest, there is a huge difference.

Could there be a resource leak of some kind?

ppurka commented 12 years ago
comment:9

A resource leak might be a possibility. I ran the following two commands:

Vanilla sage-5.1beta5

...sage-5.1.beta5/devel/sage» ../../sage -t --long -force_lib sage/plot/plot3d/list_plot3d.py 
sage -t --long -force_lib "devel/sage-main/sage/plot/plot3d/list_plot3d.py"
     [10.3 s]

----------------------------------------------------------------------
All tests passed!
Total time for all tests: 10.3 seconds

Then I changed that list_plot example to the following:

sage: list_plot3d([(0,0,1), (2,3,4), (-1,-1,-1)],num_points=400).export_jmol(tmp_filename()+'.jmol') # long time

After change to the list_plot example

...sage-5.1.beta5/devel/sage» ../../sage -t --long -force_lib sage/plot/plot3d/list_plot3d.py 
sage -t --long -force_lib "devel/sage-main/sage/plot/plot3d/list_plot3d.py"
     [8.6 s]

----------------------------------------------------------------------
All tests passed!
Total time for all tests: 8.6 seconds

That is quite a bit of a difference just for dumping the jmol into a file.

Is there some automated way to measure the memory/cpu usage of a running process?

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-This used to work on Solaris SPARC in sage-5.1.beta3, but no longer in sage-5.1.beta4:
+#12798 causes this:

sage -t --long -force_lib devel/sage/sage/plot/plot3d/list_plot3d.py

jdemeyer commented 6 years ago

Description changed:

--- 
+++ 
@@ -1,16 +1,12 @@
-#12798 causes this:
+#12798 and the matplotlib upgrade to 2.1.0 cause this:

-sage -t --long -force_lib devel/sage/sage/plot/plot3d/list_plot3d.py - Error: TIMED OUT! PROCESS KILLED!

jdemeyer commented 6 years ago

Description changed:

--- 
+++ 
@@ -10,3 +10,5 @@
 **********************************************************************

on hardware which is slow in computing with NaN values. + +This plot is done by tachyon but in the input to tachyon, there are many NaNs.

jdemeyer commented 6 years ago

Example tachyon input

jdemeyer commented 6 years ago
comment:18

Attachment: tachyon.dat.gz

I'm attaching an example Tachyon input file generated by list_plot3d.

jdemeyer commented 6 years ago

Patch to show the problem

jdemeyer commented 6 years ago
comment:19

Attachment: nancount.patch.gz

jdemeyer commented 6 years ago

Description changed:

--- 
+++ 
@@ -11,4 +11,34 @@

on hardware which is slow in computing with NaN values.

-This plot is done by tachyon but in the input to tachyon, there are many NaNs. +This plot is done by tachyon but in the input to tachyon, there are many NaNs. This can be seen by applying attachment: nancount.patch. The result is + +``` +sage -t --long src/sage/plot/plot3d/list_plot3d.py +** +File "src/sage/plot/plot3d/list_plot3d.py", line 143, in sage.plot.plot3d.list_plot3d.list_plot3d +Failed example:

jdemeyer commented 6 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-#12798 and the matplotlib upgrade to 2.1.0 cause this:
+Both #12798 and the matplotlib upgrade (#23696) cause this:

sage -t --long --warn-long 10.0 src/sage/plot/plot3d/list_plot3d.py @@ -42,3 +42,4 @@ Graphics3d Object


+In 
jdemeyer commented 6 years ago

Description changed:

--- 
+++ 
@@ -42,4 +42,3 @@
     Graphics3d Object
 **********************************************************************

-In

fchapoton commented 6 years ago
comment:22

simpler case maybe

sage: var('x,y')
sage: f = lambda x,y: x+y if x*x+y*y<=1 else NaN
sage: plot3d(f,(x,-2,2),(y,-2,2)).tachyon()  
fchapoton commented 6 years ago
comment:23

and

sage: G = polygon([(NaN,NaN,NaN)]*3)
sage: G.tachyon_repr(G.default_render_params())
['TRI V0 nan nan nan V1 nan nan nan V2 nan nan nan', 'texture3']

Maybe we should filter this out of tachyon input ? or more generally ?

jdemeyer commented 6 years ago
comment:24

Replying to @fchapoton:

Maybe we should filter this out of tachyon input ? or more generally ?

That doesn't solve the underlying problem. Why are NaNs suddenly appearing? That didn't happen before #23696.

strogdon commented 6 years ago
comment:26

Doesn't explain the NaN but could the following be what one is observing when doctesting

sage: l = []
sage: for i in range(-5, 5):
....:     for j in range(-5, 5):
....:         l.append((normalvariate(0, 1), normalvariate(0, 1), normalvariate(0, 1)))
....:         
sage: %timeit list_plot3d(l, interpolation_type='clough', texture='yellow', num_points=100)
The slowest run took 109.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 3.11 ms per loop

The first run is always the slowest.

kiwifb commented 6 years ago
comment:28

OK in the case of clough and linear we could able to filter NaN at a cost. Both define a function g like so

        def g(x, y):
            z = f([x, y])
            return (x, y, z)

I guess g could return empty if z turns to be a NaN. Is it even doable?

jdemeyer commented 6 years ago
comment:29

Filtering NaNs is easy, but that is really ignoring the problem. The real problem is that NaNs are appearing in a computation which did not produce NaNs before #23696.

kiwifb commented 6 years ago
comment:30

The interpolation algorithm is completely different, so there is a possibility that this particular algorithm is completely unsuited to this particular set of points. I am now curious to see what the plot look like but that will have to wait until I go back to work tomorrow morning (~14 hours in my time zone).

strogdon commented 6 years ago
comment:31

It may be that the nan are a feature of using scipy.interpolate.CloughTocher2DInterpolator. The documentation indicates that if requested points are outside the convex hull then, unless specified, they are filled in with nan.

jdemeyer commented 6 years ago
comment:32

Replying to @strogdon:

It may be that the nan are a feature of using scipy.interpolate.CloughTocher2DInterpolator. The documentation indicates that if requested points are outside the convex hull then, unless specified, they are filled in with nan.

Thanks for the info. If so, filtering them probably is the right thing to do after all.

kiwifb commented 6 years ago
comment:33

Replying to @jdemeyer:

Replying to @strogdon:

It may be that the nan are a feature of using scipy.interpolate.CloughTocher2DInterpolator. The documentation indicates that if requested points are outside the convex hull then, unless specified, they are filled in with nan.

Thanks for the info. If so, filtering them probably is the right thing to do after all.

Yes, this algorithm just doesn't do extrapolation and put NaN for anything that would be an extrapolation. We could set an arbitrary value, but to what? The average z value of the input? 0?

jdemeyer commented 6 years ago
comment:34

Replying to @fchapoton:

Maybe we should filter this out of tachyon input ? or more generally ?

If you want to filter them, I think the right place is the triangulate() method.

kiwifb commented 6 years ago
comment:35

I just spent some time with matplotlib code to check what it does for extrapolation. And it looks like it returns NaN for those points as well. So the problem should be present as well for linear interpolation. I couldn't find anything in the documentation about what the stuff behind spline do for extrapolation - by the look of the output picture it just extrapolates.

jdemeyer commented 6 years ago
comment:36

There is some discussion also on #12798 about NaN coordinates and even a patch.

strogdon commented 6 years ago
comment:37

Replying to @jdemeyer:

There is some discussion also on #12798 about NaN coordinates and even a patch.

The patch does remove the NaN values but in so doing it appears the surface is cropped such that it is not displayed over the entire convex hull.

jdemeyer commented 6 years ago
comment:38

Replying to @strogdon:

in so doing it appears the surface is cropped such that it is not displayed over the entire convex hull.

Maybe, but I wouldn't try to fix that here. It would be a non-trivial change to change the triangulate() method to better take into account the input points. That is not something that we should fix in this blocker ticket.

strogdon commented 6 years ago
comment:39

Replying to @jdemeyer:

Replying to @strogdon:

in so doing it appears the surface is cropped such that it is not displayed over the entire convex hull.

Maybe, but I wouldn't try to fix that here. It would be a non-trivial change to change the triangulate() method to better take into account the input points. That is not something that we should fix in this blocker ticket.

Does it help your situation? I can't really test whether there is a time savings. I can only verify that the NaN are stripped.

strogdon commented 6 years ago
comment:40

Replying to @strogdon:

The patch does remove the NaN values but in so doing it appears the surface is cropped such that it is not displayed over the entire convex hull.

That is, most of them are gone. There are still a few corner ones left.

kiwifb commented 6 years ago
comment:41

Replying to @strogdon:

Replying to @strogdon:

The patch does remove the NaN values but in so doing it appears the surface is cropped such that it is not displayed over the entire convex hull.

That is, most of them are gone. There are still a few corner ones left.

Can you elaborate on that. I thought NaN were not displayed anyway, so I didn't expect a difference before and after stripping them.

strogdon commented 6 years ago
comment:42

Replying to @kiwifb:

Replying to @strogdon:

Replying to @strogdon:

The patch does remove the NaN values but in so doing it appears the surface is cropped such that it is not displayed over the entire convex hull.

That is, most of them are gone. There are still a few corner ones left.

Can you elaborate on that. I thought NaN were not displayed anyway, so I didn't expect a difference before and after stripping them.

Basically, from my understanding, the stripping is an attempt to find the maximal rectangle that fits inside the convex hull and the 3D surface is drawn over this rectangle. The procedure sometimes fails, most notably at the corners of the constructed rectangle.

strogdon commented 6 years ago

Attachment: before_after_stripping_nan.patch.gz

before and after stripping NaN patch

strogdon commented 6 years ago
comment:43

Patch to see changes before and after stripping the NaNs. Apply as patch -p1 < before_after_stripping_nan.patch and then rebuild sage, ./sage -b. To test use that in comment:26 without the timeit.

kiwifb commented 6 years ago
comment:44

Yes, I can see there is some differences. What does the plotting do with NaN?

strogdon commented 6 years ago
comment:45

Replying to @kiwifb:

Yes, I can see there is some differences. What does the plotting do with NaN?

My understanding is that points that have associated values of NaN are not plotted. When the test is run there should be two plots. The plot where the NaN are stripped should be displayed above a nearly rectangular region, except possibly, near the corners of the region. It is at these corners where there are some, possibly remaining NaNs.

strogdon commented 6 years ago
comment:46

From commit 950bdd8145d262e493321bef67d3d57d26f9c464 the changes to src/sage/plot/plot3d/list_plot3d.py contain

-        T=delaunay.Triangulation(x,y)
-        f=T.nn_interpolator(z)
-        f.default_value=0.0
-        j=numpy.complex(0,1)
-        vals=f[ymin:ymax:j*num_points,xmin:xmax:j*num_points]
+        points=[[x[i],y[i]] for i in range(len(x))]
+        j = numpy.complex(0, 1)
+        f = interpolate.CloughTocher2DInterpolator(points,z)
         from .parametric_surface import ParametricSurface

So if I'm not mistaken the old delaunay triagulation implementation fills points outside the convex hull with zeros, i.e. f.default_value=0.0. The same thing can be achieved with scipy.interpolate.CloughTocher2DInterpolator with the change

diff --git a/src/sage/plot/plot3d/list_plot3d.py b/src/sage/plot/plot3d/list_plot3d.py
index c0c578b99f..77aeb583ae 100644
--- a/src/sage/plot/plot3d/list_plot3d.py
+++ b/src/sage/plot/plot3d/list_plot3d.py
@@ -461,7 +461,7 @@ def list_plot3d_tuples(v, interpolation_type, texture, **kwds):

         points=[[x[i],y[i]] for i in range(len(x))]
         j = numpy.complex(0, 1)
-        f = interpolate.CloughTocher2DInterpolator(points,z)
+        f = interpolate.CloughTocher2DInterpolator(points,z,fill_value=0.0)
         from .parametric_surface import ParametricSurface
         def g(x, y):
             z = f([x, y])

In my opinion the resulting plot is not particularly pleasing but it does reproduce the old behavior. I'm unable to determine whether there is any CPU savings. I do believe that modern graphing algorithms generate these NaNs as a mechanism to not plot points and that leaving things as-is is what should be done. In which case the doctest could be marked as # long time until the older hardware that doesn't like the NaNs is gone. So I would favor either of the above two approaches instead of stripping the NaNs.

kiwifb commented 6 years ago
comment:47

I must say that in a previous comment I mentioned the possibility of using fill_value. If 0 reproduce the old behavior we could go with that but I don't think it is an especially good or always appropriate value. As you said if NaN means do not plot, then it is more appropriate.

The problem is not with older hardware, power8 is not especially old, but its handling of NaN is currently poor. I don't know if it can be improved at the software level eventually but I would hope so.

strogdon commented 6 years ago
comment:48

Replying to @kiwifb:

I must say that in a previous comment I mentioned the possibility of using fill_value. If 0 reproduce the old behavior we could go with that but I don't think it is an especially good or always appropriate value. As you said if NaN means do not plot, then it is more appropriate.

The problem is not with older hardware, power8 is not especially old, but its handling of NaN is currently poor. I don't know if it can be improved at the software level eventually but I would hope so.

I couldn't check things with vanilla Sage. But I was able to install MPL-1.5.3 on sage-on-gentoo and I could manually replicate the current behavior by setting f.default_value=numpy.float('nan') with the old delaunay triangulation.

jdemeyer commented 6 years ago
comment:49

Replying to @strogdon:

I do believe that modern graphing algorithms generate these NaNs as a mechanism to not plot points and that leaving things as-is is what should be done.

-1.

Plotting NaNs makes no sense at all. It can only lead to garbage in, garbage out.

I am actually surprised that removing the NaNs changes the output. I haven't looked at the code, but it makes me suspect that maybe the NaN removal is not done correctly.

strogdon commented 6 years ago
comment:50

Replying to @jdemeyer:

Plotting NaNs makes no sense at all. It can only lead to garbage in, garbage out.

Try it!

sage: import matplotlib.pyplot as plt
sage: fig, (ax1, ax2) = plt.subplots(1,2)
sage: ax1.plot([1, 2, 2, 2, 3], [1, 1, NaN, 2, 2]);
sage: ax2.plot([1, 2, 2, 3], [1, 1, 2, 2]);
sage: plt.show()

There may be other ways, but both scipy and matplotlib know how to handle such things.

jdemeyer commented 6 years ago
comment:51

What's your point? That example is showing a Line2D object which is a high-level object: it's a collection of lines, which the low-level renderer will treat as individual lines to be drawn.

It's precisely the transformation of a high-level object (the input to list_plot3d) to a low-level object (a collection of triangles) that is going wrong in this ticket.

jdemeyer commented 6 years ago
comment:52

In other words: the bug is not about a NaN appearing in the input of list_plot3d. It is about a NaN appearing in the processing of otherwise valid input and being sent to the rendering engine.

strogdon commented 6 years ago
comment:53

Replying to @jdemeyer:

In other words: the bug is not about a NaN appearing in the input of list_plot3d. It is about a NaN appearing in the processing of otherwise valid input and being sent to the rendering engine.

OK, so I guess there could be a problem inside ParametricSurface which receives input that contains the NaNs. A big task to fix that. So perhaps we should do as was done previously with the old code until ParametricSurface can be investigated, i.e. fill points outside the convex hull with zeros? I believe we know how to do that with CloughTocher2DInterpolator.

jdemeyer commented 6 years ago
comment:54

Replying to @strogdon:

OK, so I guess there could be a problem inside ParametricSurface which receives input that contains the NaNs. A big task to fix that.

I still think that it should be possible to just remove all the triangles containing a NaN. That can't be too hard.