wrobstory / vincent

A Python to Vega translator
MIT License
2.04k stars 227 forks source link

Scatter plot interprets NaN as 0 value. #118

Open ianp opened 10 years ago

ianp commented 10 years ago

If I have a data frame which contains NaN values, say by merging or joining 2 different data sets, when vincent creates Scatter plants it interest these as zero rather than omitting them. For example the fragment below displays all of the data squashed up in the top right corner of the plot.

df = pd.DataFrame({'Data 1': [1115, np.nan, np.nan, 1128, 1145, 1173,  1115,    1162],
                   'Data 2': [1142,   1127,   1152, 1118, 1161, 1119, np.nan, np.nan]},
                   index=[ 536,   456,    567,   678,  453,  621,   343,     398])
vincent.Scatter(df, width=800).display()

There should be a way to either ignore NaN values and/or easily control the ranges that are plotted (i.e. limit the values on both axes).

I had a brief look through the source and couldn't see an obvious way to do either.

ianp commented 10 years ago

Digging through the Vega docs I've found that I can work around the issue (for my current data set, at least) by setting "zero": false on the scales, so this works:

p.scales[0].zero = False
p.scales[1].zero = False

not sure if there is a cleaner way to do it, and the problem would still exist if you have a data set that crosses either axis at 0.