Closed curlette closed 7 years ago
This might not be solved for
%mml GUESS SCHEMA FOR foo
https://probcomp-1.csail.mit.edu:9090/notebooks/casarsa_notebooks/krns_analysis_test_run.ipynb
(see example in the fourth cell of the link above)
@leocasarsa this is a genuine bug, not an installation issue.
@curlette There are two issues causing the bug -- one introduced by Fix #475, which is being masked by a bug in guesser_wrapper
. Can you please look into this? The test case is in this notebook:
http://probcomp-1.csail.mit.edu:8888/notebooks/krns_fmri_analysis_BUG_REPRODUCE.ipynb
You can copy the .csv
file into your own development environment and explore further.
Fixed in b643c1d0422eb027eb41baa45118a45b60c627bb
It seems that, after b643c1d that keyable_p([1, 2.13])
returns True, even though the existence of a float should ensure that the column is not keyable. Reopening for further exploration.
Stattype guessing in guess.py first searches for a key, where a key is defined as the first column for which there are no None or NaN values and len(col) = len(unique values in col). In tables that don't begin with a rowid column, the key identification method seems very likely to mis-classify a column with a bunch of unique floats as a key. I think that could be improved by also checking if the values have non-zero decimal places (e.g. 3.23, -0.34), which would probably indicate they aren't keys even if they're all unique.
@leocasarsa