Look at this dataset: https://www.kaggle.com/zhijinzhai/loandata
It has a column paid_off_time which is of type Date. There are many null values, but it's natural for this dataset and mindsdb should support having it as the output column without dropping nulls.
Currently mindsdb will drop all rows where paid_off_time is null at the DataCleaner phase, and will train a model to predict paid_off_timegiven"loan_status" == "PAIDOFF", because rows where "loan_status" != "PAIDOFF" are dropped.
This might make sense for other data types too.
This can be made optional (a flag in advanced_args).
Look at this dataset: https://www.kaggle.com/zhijinzhai/loandata It has a column
paid_off_time
which is of type Date. There are many null values, but it's natural for this dataset and mindsdb should support having it as the output column without dropping nulls.Currently mindsdb will drop all rows where
paid_off_time
is null at theDataCleaner
phase, and will train a model to predictpaid_off_time
given"loan_status" == "PAIDOFF"
, because rows where"loan_status" != "PAIDOFF"
are dropped.This might make sense for other data types too.
This can be made optional (a flag in
advanced_args
).