probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
http://probcomp.csail.mit.edu/software/bayesdb
Apache License 2.0
922 stars 64 forks source link

Crash on CREATE POPULATION with a column of all `nan` values causes issue(s) #489

Open curlette opened 7 years ago

curlette commented 7 years ago

1) During population creation, stattype guessing will set values in very sparse columns to null. Consequently, the failure doesn't occur during population creation, but rather when running analysis.

2) A column might not originally have all null values, but does after stattype guessing nullifies some of them. Then, in order to make a valid population with no all-NaN columns, one must drop columns from the underlying table. This violates the contract that a table with a population(s) associated with it should not be modified.

Potential solution: Create a separate copy of a table each time a new population is created for it. This way, if an observation needs to be incorporated into a population, it can be incorporated only into that population's copy of a table rather than into the original table (shared by populations, as it is now).