Better handling of NaNs and masked arrays in plot()

GoogleCodeExporter commented 9 years ago

Consider the following code
>>> y = np.ma.masked_array([1,2,np.nan,3], [0,1,0,0])
>>> vv.plot(y, ms='.')
I would expect to get a point at (1,1) and a point at (4,3).  They might be 
connected by a line, but they might not.  (matplotlib doesn't give a line, for 
instance.)  Instead, I get points at (1,1), (2,2), (0,0), and (4,3), connected 
in that order.  It seems that visvis is ignoring the mask and considering NaN 
== 0.

Original issue reported on code.google.com by rschr...@gmail.com on 5 Oct 2012 at 1:46

GoogleCodeExporter commented 9 years ago

No, your OpenGl is considering NaN == 0. On my Windows box the Nan is not drawn 
as expected. Trying to draw Inf, on the other hand, gets weird results: (4,3) 
is not drawn.

I think there are three values that should be ignored: NaN, Inf (positive and 
negative), and a masked value in a masked array.

The first question is how should visvis draw these? I think mpl's approach 
makes a lot of sense, but omitting the value and drawing a line between the 
adjecent values also seems fine to me.

The second question is how can visvis do this? 

On systems like mine it would be as simple as creating a masked array for inf 
values and converting all masked values to nan: "np.ma.masked_array(y, 
np.isinf(y)).filled(np.nan)".  But your result tells us that this is not OpenGl 
behavior that we can rely on.

One solution is to simply pop any invalid values out of the array. You then get 
the result that the points are omitted but the lines are drawn between adjacent 
points. In order to let visvis *not* draw lines for these cases would be very 
difficult I think.

Any thoughts?

Original comment by almar.klein@gmail.com on 5 Oct 2012 at 9:35

Changed state: Started

GoogleCodeExporter commented 9 years ago

Re "How should visvis draw these?" 
IMHO the job of a plotter is to represent the data *as it is*.
If the data has defects, the visual representation of the data should show 
defects.
Then the user may choose to repair their own data prior to display.
All of which is to say that IMHO, it is not acceptable to silently cover a data 
gap by visually joining he neighbours of a nan-point.

I'll admit that point of view is scientifically motivated, and I am a 
scientific user. There may be others who use visvis with other aims for which 
silent fixing is acceptable, I can only speak for myself.

Original comment by owe...@hotmail.com on 5 Oct 2012 at 12:38

GoogleCodeExporter commented 9 years ago

You're absolutely right. Visvis is intended for scientific visualization. I was 
thinking of NaNs as datapoints-to-be-ignored, but indeed they are often the 
result of an error, and these data should therefore not be hidden.

Having said that, is the approach of MPL (not drawing data-point and line) the 
best approach? Or can we think of yet another way to visualize this?

Original comment by almar.klein@gmail.com on 5 Oct 2012 at 12:58

GoogleCodeExporter commented 9 years ago

> No, your OpenGl is considering NaN == 0.

You're absolutely right.  My work box skips the NaN, and it doesn't draw a line 
between the points on either side of it.  So my recommendation is to first 
change masked values to NaNs, and then try to figure out how to work around the 
broken OpenGL on my laptop.

Original comment by rschr...@gmail.com on 5 Oct 2012 at 3:23

GoogleCodeExporter commented 9 years ago

I'm not sure whether your laptops OpenGl behavior is broken. It may well be 
that how NaN's should be drawn is not specified by OpenGl.

I see two options now. Either we convert all 'invalid' values to NaN and hope 
that OpenGl handles it the right way, or we explicitly do some work so we can 
ensure the intended behavior.

The first is *much* simpler and will probably do the trick on most systems. How 
bad is it for the cases where it does not? Is drawing a zero enough feedback to 
the user that the data point is invalid?

Original comment by almar.klein@gmail.com on 5 Oct 2012 at 3:34

GoogleCodeExporter commented 9 years ago

FWIW, my laptop handles np.inf correctly, not plotting such points and skipping 
lines through them.  If this is more reliable, perhaps we could change NaNs to 
Infs.

> Is drawing a zero enough feedback to the user that the data point is invalid?

I would say no.  If your data is around 0, you might not notice the extra 
points, especially if the points aren't draw with lines between them.

Original comment by rschr...@gmail.com on 7 Oct 2012 at 11:33

GoogleCodeExporter commented 9 years ago

> my laptop handles np.inf correctly, not plotting such points and skipping 
lines through them.  If this is more reliable, perhaps we could change NaNs to 
Infs.

Unfortunately no, my laptop does not work well with Infs.

> I would say no. 

Ok, it seems like we need to do some actual work then :) 
I would say something along the line of checking if there are invalid numbers. 
If there are, convert the datapoints to pairs (as in the '+' linestyle), which 
means the data becomes twice as big. Then remove any pair which has an invalid 
number.

Shall look at this today.

Original comment by almar.klein@gmail.com on 8 Oct 2012 at 7:27

GoogleCodeExporter commented 9 years ago

Correction: Inf does work on my laptop. Tested on a WinXP VM without any 
special hardware and there it works too.

I pushed a commit that translates any invalid values (also masked arrays) to 
Inf in the SetPoints() method.

I say this fixes the issue until we find evidence that there is a significant 
fraction of OpenGl drivers that does not work with this fix.

Original comment by almar.klein@gmail.com on 8 Oct 2012 at 9:00

GoogleCodeExporter commented 9 years ago

NaNs are working on both my machines now.

Unfortunately, masked arrays aren't working with plot(), because plot is 
converting the points to a PointSet before passing them to Line().  My 
suggestion would be to pass the points as an array to Line, and let it do the 
conversion to PointSet.

Two related questions:
1) A user can go tweak Line.points after the SetPoints method call through the 
Set*data methods.  In this case, they could reinsert NaNs into self._points.  
Should we be worried by this?  Relatedly, we might not want to be setting all 
the coordinates of invalid points to np.inf.  If a user has a few NaNs in the z 
data, he would expect to be able to fix it by using only SetZdata.

2) I notice in plot.py a helper function makeArray.  Is there a reason not to 
replace its body with "return np.asanyarray(data, dtype=float)"?

Original comment by rschr...@gmail.com on 8 Oct 2012 at 4:35

GoogleCodeExporter commented 9 years ago

I pushed a commit to fix this. I put the code to correct the values in a 
separate function (actually took me a while to make one that works well for the 
three situations in which it is used).

plot() now passes an array as you suggested. Only thing is that it only needs 
to use the np.ma.concatenate instead of the normal np.concatenate if there's 
masked arrays involved.

> I notice in plot.py a helper function makeArray.  Is there a reason not to 
replace > its body with "return np.asanyarray(data, dtype=float)"?

That was me just learning Python and Numpy :) Fixed it.

Can you please check if things work as intended?

Original comment by almar.klein@gmail.com on 10 Oct 2012 at 10:18

GoogleCodeExporter commented 9 years ago

Seems to be working to me.

I pushed a slight change to handleInvalidValues() to remove the recursive step. 
 I saw no reason why that was necessary; execution can just carry on with 
_inplace=True.  (Please check me on this.)  This should be slightly faster, but 
more importantly it's more readable, IMO.

I believe that, with this change, handleInvalidValues is only ever called with 
_inplace=False, so we could remove this argument.  I didn't, since I didn't 
know if it might come in handy in the future.

Original comment by rschr...@gmail.com on 11 Oct 2012 at 12:54

GoogleCodeExporter commented 9 years ago

I removed the argument.

I think the function was a bit weird because I first struggled quite a bit in 
making the function deal with pointsets also. In the end I removed that, but 
was left with some legacy...

Anyway, it looks like we can finally resolve this issue! Thanks!

Original comment by almar.klein@gmail.com on 11 Oct 2012 at 9:49

Changed state: Fixed

pbfy0 / visvis

Better handling of NaNs and masked arrays in plot() #62