sberbank-ai-lab / LightAutoML

LAMA - automatic model creation framework
Apache License 2.0
887 stars 92 forks source link

Feature names that inherited from string cause an exception #45

Closed resivalex closed 3 years ago

resivalex commented 3 years ago

I got a dataframe by pandas.read_sql_table. Every column name has a type sqlalchemy.sql.elements.quoted_name. Many checks in the source code rely on str, so fit_predict fails.

Should the library notify developers about unsupported column name types, convert them to strings or allow column names to have any type?

alexmryzhkov commented 3 years ago

Hi @resivalex,

Could you explain why sqlalchemy.sql.elements.quoted_name doesn't work as a str type? If you can share some test example with us, we will try to reproduce it on our side and fix.

Alex

resivalex commented 3 years ago

lightautoml/dataset/np_pd_dataset.py:501

        if type(columns) is str:
            idx = self.data.columns.get_loc(columns)

        else:
            idx = self.data.columns.get_indexer(columns)

columns has type sqlalchemy.sql.elements.quoted_name, which isn't str, and execution goes to "else" branch

Test example https://www.kaggle.com/resivalex/non-string-feature-names-case/