Closed jwdink closed 3 years ago
Hi @jwdink ! You're probably less concerned about this some more than two years later, but for posterity, the code documentation you are referring to is here: https://github.com/pydata/patsy/blob/master/patsy/missing.py#L26 .
# Next, what should be done once we find missing data? R's options:
# -- throw away those rows (from all aligned matrices)
# -- with or without preserving information on which rows were discarded
# -- error out
# -- carry on
# The 'carry on' option requires that we have some way to represent NA in our
# output array. To avoid further solidifying the use of NaN for this purpose,
# we'll leave this option out for now, until real NA support is
# available. Also, we always preserve information on which rows were
# discarded, using the pandas index functionality (currently this is only
# returned to the original caller if they used return_type="dataframe",
# though).
I don't plan to add support for this into patsy
, but it's worth noting that numpy NA's are still not really a thing: https://github.com/numpy/numpy/issues/15858 . In any case, this is supported in Formulaic, where na_action
can be one of: "drop", "raise" or "ignore".
For my application is would be useful for NaNs to simply remain in the data, rather than dropping them or raising an error.
I vaguely remember an explanation in the docs for why a "pass" option isn't (can't be?) implemented, but now I can't find it.