iterative / mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
https://mlem.ai
Apache License 2.0
717 stars 44 forks source link

importing excel files incorrectly sets the format to csv and writes data in csv #617

Closed aliahari closed 1 year ago

aliahari commented 1 year ago

if one tries to import an excel file using mlem.api.importobject() function and sets type as "pandas[excel]" MLEM sets the format incorrectly to csv and also if the copy_data is set to True it will write the data in csv format instead of excel. I have tracked down the issue and the problem lies in two classes "DataFrameType" and "SeriesType" in contrib.pandas package in _getwriter() method of these two classes it compares the extension of the file which is in this case either "xlsx" or "xls" with the list of extensions in _PANDASFORMATS and _PANDAS_SERIESFORMATS and since it does not match with "excel" it defaults to the csv which is the default format. One solution would be to add an extra elif such as

if ext in PANDAS_SERIES_FORMATS:
    fmt = ext
elif ext in ["xls", "xlsx"]:
    fmt = "excel"
aguschin commented 1 year ago

Thanks for reporting! Your solution should work. Do you mind contributing a PR?

aliahari commented 1 year ago

I have create a PR

aguschin commented 1 year ago

Ah sorry! Silly typo 😂 I meant contributing tests for this)