microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.88k stars 505 forks source link

`benchmark` dependencies are outdated #1060

Closed PGijsbers closed 1 year ago

PGijsbers commented 1 year ago

The benchmark optional dependencies are outdated, and xgboost==1.3.3 doesn't play nice with newer versions of pandas, which are installed by default. Running flaml after a flaml[benchmark] installation results in an error:

...
  File "/Users/pietergijsbers/repositories/automlbenchmark/frameworks/flaml/venv/lib/python3.9/site-packages/xgboost/data.py", line 242, in _from_pandas_df

    data, feature_names, feature_types = _transform_pandas_df(

  File "/Users/pietergijsbers/repositories/automlbenchmark/frameworks/flaml/venv/lib/python3.9/site-packages/xgboost/data.py", line 192, in _transform_pandas_df

    from pandas import MultiIndex, Int64Index

ImportError: cannot import name 'Int64Index' from 'pandas' (/Users/pietergijsbers/repositories/automlbenchmark/frameworks/flaml/venv/lib/python3.9/site-packages/pandas/__init__.py)
qingyun-wu commented 1 year ago

Hi @PGijsbers, thank you for raising this issue!

  1. I will make a PR to resolve the conflicting versions of pandas and xgboost.
  2. You mentioned "optional dependencies are outdated". Could you please clarify what specifically are the problems (in addition to the version problem in mentioned in 1)?

About the timeline: flaml will be releasing a new version soon (approximately in a week or two), which has a major change on how the lib shall be installed and requires corresponding changes in automlbenchmark. I plan to make a PR to make sure the new version of flaml works properly here, and also address problems 1 and 2 mentioned above in that PR.

Please let me know if you have any comments or suggestions! Thank you!

PGijsbers commented 1 year ago

Could you please clarify

Sorry about this miscommunication, it shouldn't have been an "and" there. The only issue I identified is that the outdated xgboost does not work with the more modern pandas.

As for the benchmark, we will run experiments soon. I will try to hold off with running flaml experiments to see if we can have it run with the fixed new version instead, but I can't make promises there as we are on a tight schedule.

qingyun-wu commented 1 year ago

Thanks for the information. One easy workaround I can think of at this time point is to resolve the dependency issue in flaml's setup.sh file. i.e., adding the following command line below [line 13] (assuming that you will be running the stable version in your experiments)(https://github.com/openml/automlbenchmark/blob/master/frameworks/flaml/setup.sh#L13),

PIP install pandas==1.1.4

I have tested it and it works well. Please let me know how I could further help. Thank you!

PGijsbers commented 1 year ago

Is there a specific reason that you wish to use the outdated xgboost, then? I had set up a PR using the catboost dependency which as far as I could tell was (within context of amlb) the same as benchmark. Please see https://github.com/openml/automlbenchmark/pull/528#issuecomment-1572714751

qingyun-wu commented 1 year ago

@PGijsbers, thank you for your questions and response. "Is there a specific reason that you wish to use the outdated xgboost?" This is mainly because we haven't evaluated the performance of flaml with newer versions of xgboost. As a result, we've added this requirement as a more conservative measure. However, it's possible that newer versions might function just as well. Therefore, I'm comfortable with disregarding this requirement in your PR.

I have reviewed your PR and left some other comments. Thank you!

PGijsbers commented 1 year ago

Thanks! I am sorry, but I do not see the comments you are referring to, could you add a link please?

qingyun-wu commented 1 year ago

Sorry, I forgot to publish the comment. This is the link: https://github.com/openml/automlbenchmark/pull/528#pullrequestreview-1472248424