microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.4k stars 2.63k forks source link

Cannot Run qrun with lightgbm workflow config #798

Open Waterkin opened 2 years ago

Waterkin commented 2 years ago

🐛 Bug Description

(qlib) 👍  examples   main  qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml [33800:MainThread](2022-01-03 15:22:20,670) INFO - qlib.Initialization - [config.py:391] - default_conf: client. [33800:MainThread](2022-01-03 15:22:20,671) WARNING - qlib.Initialization - [config.py:416] - redis connection failed(host=127.0.0.1 port=6379), DiskExpressionCache and DiskDatasetCache will not be used! [33800:MainThread](2022-01-03 15:22:20,672) INFO - qlib.Initialization - [init.py:68] - qlib successfully initialized based on client settings. [33800:MainThread](2022-01-03 15:22:20,672) INFO - qlib.Initialization - [init.py:70] - data_path={'DEFAULT_FREQ': PosixPath('/Users/waterking/.qlib/qlib_data/cn_data')} [33800:MainThread](2022-01-03 15:22:20,673) INFO - qlib.workflow - [expm.py:282] - No tracking URI is provided. Use the default tracking URI. [33800:MainThread](2022-01-03 15:22:20,673) INFO - qlib.workflow - [expm.py:318] - <mlflow.tracking.client.MlflowClient object at 0x7fa97d0b8ac0> [33800:MainThread](2022-01-03 15:22:20,693) INFO - qlib.workflow - [exp.py:249] - Experiment 1 starts running ... [33800:MainThread](2022-01-03 15:22:20,796) INFO - qlib.workflow - [recorder.py:290] - Recorder a3ea360a53a84f0dbb7dae7b7e683dcc starts running under Experiment 1 ... /Users/waterking/opt/anaconda3/envs/qlib/lib/python3.8/site-packages/pyqlib-0.8.0.99-py3.8-macosx-10.9-x86_64.egg/qlib/utils/init__.py:808: FutureWarning: MultiIndex.is_lexsorted is deprecated as a public function, users should use MultiIndex.is_monotonic_increasing instead. if idx.is_monotonic_increasing and not (isinstance(idx, pd.MultiIndex) and not idx.is_lexsorted()): [33800:MainThread](2022-01-03 15:23:21,552) INFO - qlib.timer - [log.py:113] - Time cost: 60.085s | Loading data Done [33800:MainThread](2022-01-03 15:23:22,967) INFO - qlib.timer - [log.py:113] - Time cost: 0.411s | DropnaLabel Done /Users/waterking/opt/anaconda3/envs/qlib/lib/python3.8/site-packages/pandas-1.3.5-py3.8-macosx-10.9-x86_64.egg/pandas/core/frame.py:3641: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self[k1] = value[k2] [33800:MainThread](2022-01-03 15:23:27,797) INFO - qlib.timer - [log.py:113] - Time cost: 4.829s | CSZScoreNorm Done [33800:MainThread](2022-01-03 15:23:27,835) INFO - qlib.timer - [log.py:113] - Time cost: 6.282s | fit & process data Done [33800:MainThread](2022-01-03 15:23:27,836) INFO - qlib.timer - [log.py:113] - Time cost: 66.369s | Init data Done [1] 33800 segmentation fault qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml /Users/waterking/opt/anaconda3/envs/qlib/lib/python3.8/site-packages/joblib-1.1.0-py3.8.egg/joblib/externals/loky/backend/resource_tracker.py:318: UserWarning: resource_tracker: There appear to be 2 leaked folder objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /Users/waterking/opt/anaconda3/envs/qlib/lib/python3.8/site-packages/joblib-1.1.0-py3.8.egg/joblib/externals/loky/backend/resource_tracker.py:333: UserWarning: resource_tracker: /var/folders/gr/17hy_zfj49l4p998g6xq2d340000gn/T/joblib_memmapping_folder_33800_95596af9eff64ff4b7116daedbbcbc53_a5376a7190bc40818f999d3d14e3d716: FileNotFoundError(2, 'No such file or directory') warnings.warn('resource_tracker: %s: %r' % (name, e)) /Users/waterking/opt/anaconda3/envs/qlib/lib/python3.8/site-packages/joblib-1.1.0-py3.8.egg/joblib/externals/loky/backend/resource_tracker.py:333: UserWarning: resource_tracker: /var/folders/gr/17hy_zfj49l4p998g6xq2d340000gn/T/joblib_memmapping_folder_33800_ca9c83dbe40b49aeacc099c618d44dfa_9a4c5e1eb21743a884e50325b7cd17fa: FileNotFoundError(2, 'No such file or directory') warnings.warn('resource_tracker: %s: %r' % (name, e))

To Reproduce

Steps to reproduce the behavior:

1.cd examples # Avoid running program under the directory contains qlib 2.qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information and paste them here directly.

Additional Notes

Please help, thanks a lot!~

Wangwuyi123 commented 2 years ago

I tried to reproduce your problem, but did not succeed, please try again image

you-n-g commented 2 years ago

@Waterkin Qlib uses the same approach for the automatic test and the test suceeds. https://github.com/microsoft/qlib/blob/main/.github/workflows/test.yml#L65 Could you please provide more details to reproduce this error?

For example, provide more details by running cd scripts && python collect_info.py all

RoloVoid commented 2 years ago

I met the same problem when trying to qrun the lightgbm config, info as follows:

(qlib) root@fe7b424c885e:~/qlib/examples# qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
[7589:MainThread](2022-03-12 10:58:41,418) INFO - qlib.Initialization - [config.py:402] - default_conf: client.
[7589:MainThread](2022-03-12 10:58:41,424) INFO - qlib.Initialization - [__init__.py:73] - qlib successfully initialized based on client settings.
[7589:MainThread](2022-03-12 10:58:41,424) INFO - qlib.Initialization - [__init__.py:75] - data_path={'__DEFAULT_FREQ': PosixPath('/root/.qlib/qlib_data/cn_data')}
[7589:MainThread](2022-03-12 10:58:41,425) INFO - qlib.workflow - [expm.py:318] - <mlflow.tracking.client.MlflowClient object at 0x7f243f1692e0>
[7589:MainThread](2022-03-12 10:58:41,459) INFO - qlib.workflow - [exp.py:257] - Experiment 1 starts running ...
[7589:MainThread](2022-03-12 10:58:41,578) INFO - qlib.workflow - [recorder.py:293] - Recorder 9054b86afffe4ae29f2e992eb531c061 starts running under Experiment 1 ...
/root/anaconda3/envs/qlib/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
/root/anaconda3/envs/qlib/lib/python3.8/site-packages/qlib/utils/__init__.py:792: FutureWarning: MultiIndex.is_lexsorted is deprecated as a public function, users should use MultiIndex.is_monotonic_increasing instead.
  if idx.is_monotonic_increasing and not (isinstance(idx, pd.MultiIndex) and not idx.is_lexsorted()):
[7589:MainThread](2022-03-12 11:00:51,940) INFO - qlib.timer - [log.py:113] - Time cost: 128.654s | Loading data Done
[7589:MainThread](2022-03-12 11:01:00,391) INFO - qlib.timer - [log.py:113] - Time cost: 0.572s | DropnaLabel Done
/root/anaconda3/envs/qlib/lib/python3.8/site-packages/qlib/data/dataset/processor.py:310: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[cols] = df[cols].groupby("datetime").apply(self.zscore_func)
[7589:MainThread](2022-03-12 11:01:13,439) INFO - qlib.timer - [log.py:113] - Time cost: 13.047s | CSZScoreNorm Done
[7589:MainThread](2022-03-12 11:01:13,440) INFO - qlib.timer - [log.py:113] - Time cost: 21.497s | fit & process data Done
[7589:MainThread](2022-03-12 11:01:13,441) INFO - qlib.timer - [log.py:113] - Time cost: 150.155s | Init data Done
Killed
(qlib) root@fe7b424c885e:~/qlib/examples# /root/anaconda3/envs/qlib/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:318: UserWarning: resource_tracker: There appear to be 2 leaked folder objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/root/anaconda3/envs/qlib/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:333: UserWarning: resource_tracker: /tmp/joblib_memmapping_folder_7589_f6d3f263998248d0b69f07daaa34f180_cc0445d7b4ae408cb69956a0c0168cf3: FileNotFoundError(2, 'No such file or directory')
  warnings.warn('resource_tracker: %s: %r' % (name, e))
/root/anaconda3/envs/qlib/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:333: UserWarning: resource_tracker: /tmp/joblib_memmapping_folder_7589_4994d8a48e124178a8d918fdd1a8d497_5b46fd15f6814361ade5610b909fbbb2: FileNotFoundError(2, 'No such file or directory')
  warnings.warn('resource_tracker: %s: %r' % (name, e))
Linux
x86_64
Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-glibc2.10
#1 SMP Wed Aug 25 23:20:18 UTC 2021

Python version: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:53:36)  [GCC 9.4.0]

Qlib version: 0.8.4.99
numpy==1.22.3
pandas==1.4.1
scipy==1.8.0
requests==2.27.1
sacred==0.8.2
python-socketio==5.5.2
redis==4.1.4
python-redis-lock==3.7.0
schedule==1.1.0
cvxpy==1.2.0
hyperopt==0.1.2
fire==0.4.0
statsmodels==0.13.2
xlrd==2.0.1
plotly==5.6.0
matplotlib==3.5.1
tables==3.7.0
pyyaml==6.0
mlflow==1.24.0
tqdm==4.63.0
loguru==0.6.0
lightgbm==3.3.2
tornado==6.1
joblib==1.1.0
fire==0.4.0
ruamel.yaml==0.17.21
ethanygao commented 2 years ago

I also have the similar error here: https://github.com/microsoft/qlib/issues/1099

And I cannot pass the pytest either.