Closed duncanjjansen closed 3 years ago
@duncanjjansen This is a numpy
error, but the bug stems from the fact that the autoimpute
code for the MissingnessClassifier
uses an underlying numpy
function incorrectly.
What the MissingnessClassifier
should do is check the dtype of each column and ensure that it is not a date (autoimute
does not currently support imputation for dates). In the code, the MissingnessClassifier
passes each column name to np.issubdtype
, but it should pass the dtype
of the column instead. You'll see in the numpy docs that the np.issubdtype
function takes a dtype
or a string representing a typecode - not a column name (obviously)!
The confusion here stems from the fact that some column names are reserved strings that represent dtype
codes! a
, b
, c
are strings representing dtype codes, while k
is not. If you change k
to S
, the code will erroneously run, but if you change k
to s
you'll get the same error as with k
.
I'll have a patch ready for this in the next or so. Thanks for catching this!
@kearnz Thanks for the quik reply. Makes sense, had a feeling it was something like this. I'll be waiting for that patch :)
@duncanjjansen I just released version 0.12.1. This should fix the bug you identified. Let me know if you're having any other problems!
The following works:
However, when I name column 'a' -> 'k':
Does anyone have a clue why this would happen and how to fix?
python version 3.8.3