mindsdb / mindsdb_native

Machine Learning in one line of code
http://mindsdb.com
GNU General Public License v3.0
36 stars 28 forks source link

Existing of column with name 'id' in dataset make bad predictor #102

Closed StpMax closed 4 years ago

StpMax commented 4 years ago

For example let take 'home_rentals' dataset and let train predictor to predict 'rental_price'. I made test of three cases:

  1. train on data 'as is', without any changes. I got right results.
  2. train on data with serial column with name not equal to 'id'. In this case results good too.
  3. train on data with serial column with name 'id'. In this case results absolutely bad. 'rental_price' in prediction results in most cases is negative. min/max values looks like random values.
George3d6 commented 4 years ago

This is likely happening due to id not being correctly identified as a foreign_key or not being properly removed by the DataCleaner

George3d6 commented 4 years ago

Should be fixed by #103