microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.91k stars 508 forks source link

Need some kind of ignorance #315

Closed citron closed 2 years ago

citron commented 2 years ago

Hello In LightGBM, we can pass a dataframe with features marked as ignored by using the "ignore_column" option. Is there a way to do the same in Flaml ? This is very useful when one needs to keep track of records. Or did I miss something ? Have a wonderful coding day !

sonichi commented 2 years ago

From lightgbm doc: Note: works only in case of loading data directly from text file. Not sure what is the use case you are thinking of. Could you elaborate more?

citron commented 2 years ago

I do use Flaml at the hospital emergency service. A CSV row is indexed by a unique "passage" number which allows us to relate measurement to patients. As the CSV is used in various pipelines and flows, it would be easier not to alter it to much. I do prefer to keep the "passage" numbers even If It should not be treated a feature.

sonichi commented 2 years ago

@citron For such columns with unique value for each row, flaml will automatically drop them during training and prediction. You can keep them in your data. Please let me know if it doesn't work.

citron commented 2 years ago

@sonichi I noticed that among my 350000 lines, I had 7 with the same "passage" numbers. That why Flaml saw this column as a category. Thanks for the help !