'columns_to_ignore' is unused?

mindsdb / mindsdb_native

Machine Learning in one line of code

http://mindsdb.com

GNU General Public License v3.0

37 stars 28 forks source link

'columns_to_ignore' is unused? #391

Closed StpMax closed 3 years ago

StpMax commented 3 years ago

I train predictor from 'concrete_strength' with 'id' column ignored. Then in predictor data_analysis_v2 i get:

{
    'columns':['water', 'coarseAggregate', 'cement', 'flyAsh', 'fineAggregate', 'superPlasticizer', 'id', 'slag', 'age', 'concrete_strength'],
    # list of all columns analysis, except 'id'
    'columns_to_ignore':{},
    'train_std_dev':{'concrete_strength': 16.257130204762287}
}

Im not seen predictors where 'columns_to_ignore' was filled. What about remove that key from data_analysis?

George3d6 commented 3 years ago

Hmh, columns_to_ignore is a kye in the lmd (light model data) but I'm not sure if and/or why it would be a key in the data_analysis_v2, train_std_dev and columns aren't columns in data_analysis_v2 either.

Are you getting the value above from native or from the API? It looks weird, certainly not like data_analysis_v2. If it's coming from the distribution_2 branch, please ignore it :)) , there was something weird I did there that might cause this.

Could you provide the steps to replicate?

StpMax commented 3 years ago

That on staging branch. Here is how replicate:

from mindsdb_native import F, FileDS, Predictor
p = Predictor(name='xxx')
ds = FileDS('/home/maxs/dev/mdb/venv_new/sources/private-benchmarks/benchmarks/datasets/concrete_strength/data.csv')
p.learn(from_data=ds, to_predict=['concrete_strength'], ignore_columns=['id'])
F.get_model_data('xxx')['data_analysis_v2']

George3d6 commented 3 years ago

removed, also removed all non-col-name keys from stats_v2, and no longer used in mindsdb proper