probcomp / bdbcontrib

BayesDB contributions, including plotting, helper methods, and examples
http://probcomp.csail.mit.edu/bayesdb
Apache License 2.0
9 stars 6 forks source link

Pairplot acting strangely #141

Closed BelhalK closed 8 years ago

BelhalK commented 8 years ago

My query

import bdbcontrib.plot_utils as pu
import bdbcontrib.bql_utils as bu
assignement_1 = pu.pairplot(bdb,bu.query(bdb,'''select Period_minutes, Perigee_km 
from test WHERE Anticipated_Lifetime=15;'''))

(for info, bu.query(bdb,'''select Period_minutes, Perigee_km from test WHERE Anticipated_Lifetime=15;''') has 21 entries)

  Period_minutes  Perigee_km
0          1436.08       35773
1          1436.07       35776
2           114.10        1413
3          1436.10       35774
4          1436.10       35777
5           994.83        6179
6          1436.10       35774
7          1436.10       35785
8          1439.41       35850
9          1436.07       35766
10         1436.06       35776
11         1436.24       35775
12         1436.12       35779
13         1436.11       35768
14          114.08        1412
15         1436.12       35783
16         1436.09       35784
17         1436.01       35768
18         1436.10       35700
19         1436.21       35784
20         1436.07       35770
21         1436.10       35774

here's the plot assignement_1_pp

First, the scaling as highlighted is weird (jumping from 114 to 994): I understand that there are no values in between but my point is to show this gap Second, the scatter plot is a weird heatmap: i can not think of an explanation.

NB: (Period_minutes normal, Perigee_km normal) are the types

gregory-marton commented 8 years ago

How did you construct the "test" table?

gregory-marton commented 8 years ago

By the way, the two complaints are related: the period_minutes variable here is seen as categorical rather than numeric, so each category is naturally given its own equal space.

gregory-marton commented 8 years ago

And so is perigee_km, which is why you get a heatmap rather than a scatter: plotting categorical against categorical makes sense only as a heatmap.

gregory-marton commented 8 years ago

I suspect you were trying to do something like this?

screen shot 2016-05-12 at 1 52 41 pm
gregory-marton commented 8 years ago

Looking at it a bit farther, pairplot may not know the types of the variables. If you're keeping the same names as in the generator (and it looks like you are for Satellites), then you can pass generator_name=satellites_cc or similar, and probably get the right statistical types from there.

BelhalK commented 8 years ago

Yep that worked. Thanks!