tvdboom / ATOM

Automated Tool for Optimized Modelling
https://tvdboom.github.io/ATOM/
MIT License
148 stars 14 forks source link

Running into issues with binary classification example #4

Closed ragrawal closed 3 months ago

ragrawal commented 3 years ago

Hi,

Pretty excited about the library but ran into several issues with the binary classification example. I tried rerunning the binary classification example and got this error ValueError: Columns to be encoded can not contain new values while running atom.encode(strategy="CatBoost", max_onehot=10, frac_to_other=0.04).

Then I tried the library on my own data that had no categorical data. However on calling atom.run , all my models failed because they were not serializable (PickleError).

tvdboom commented 3 years ago

Hi,

About the first issue with the encoding: I reran the example and it worked fine for me. Did you use the same dataset as the example? To clarify the exception. The error ValueError: Columns to be encoded can not contain new values is not raised by atom but by the category_encoders package. The error is raised when the column to be transformed presents classes (with classes I mean the possible values in a column, e.g. 0 and 1 are the two classes in the target column of a binary classifier) that were not encountered during fitting. This is intended behavior, since the transformer wouldn't know what to do with the new classes. In atom's case, it means the test set contains classes that were not present in the training set. This usually happens if the training set is very small or if the column has many classes with few occurrences. To fix this, either make sure that the training set contains all the classes in the column or increase the frac_to_other parameter.

About the second issue: I need a bit more information to know why the error showed. Can you share the whole traceback? Did you use atom's predefined models (if so, which ones?) or custom ones?

github-actions[bot] commented 3 months ago

Stale issue message: This issue will be automatically closed by GitHub Actions in 1 week if there is no further activity.