mindsdb / mindsdb_native

Machine Learning in one line of code
http://mindsdb.com
GNU General Public License v3.0
36 stars 28 forks source link

Support null output for dates #95

Closed maximlopin closed 3 years ago

maximlopin commented 4 years ago

Look at this dataset: https://www.kaggle.com/zhijinzhai/loandata It has a column paid_off_time which is of type Date. There are many null values, but it's natural for this dataset and mindsdb should support having it as the output column without dropping nulls.

Currently mindsdb will drop all rows where paid_off_time is null at the DataCleaner phase, and will train a model to predict paid_off_time given "loan_status" == "PAIDOFF", because rows where "loan_status" != "PAIDOFF" are dropped.

This might make sense for other data types too.

This can be made optional (a flag in advanced_args).

George3d6 commented 3 years ago

Added remove_columns_with_missing_targets flag in advanced args (defaults to True).