microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
14.55k stars 2.53k forks source link

DDG-DA Assertion Error #1760

Closed l0ngc closed 3 months ago

l0ngc commented 3 months ago

❓ Questions and Help

We sincerely suggest you to carefully read the documentation of our library as well as the official paper. After that, if you still feel puzzled, please describe the question clearly under this issue.

Hi, thanks for help! I met one problem when I try to run DDG-DA.

(qlib) [longc@arch DDG-DA]$ pwd
/home/longc/projects/qlib/examples/benchmarks_dynamic/DDG-DA
(qlib) [longc@arch DDG-DA]$ python workflow.py --conf_path=../baseline/workflow_config_lightgbm_Alpha158.yaml run
2024-03-14 23:45:35.137 | WARNING  | qlib.tests.data:qlib_data:175 - Data already exists: ~/.qlib/qlib_data/cn_data, the data download will be skipped
        If downloading is required: `exists_skip=False` or `change target_dir`
[761674:MainThread](2024-03-14 23:45:35,137) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[761674:MainThread](2024-03-14 23:45:35,138) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[761674:MainThread](2024-03-14 23:45:35,139) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/home/longc/.qlib/qlib_data/cn_data')}
[761674:MainThread](2024-03-14 23:45:35,144) INFO - qlib.Rolling - [base.py:162] - The prediction horizon is overrided
[761674:MainThread](2024-03-14 23:45:35,144) INFO - qlib.Rolling - [base.py:173] - {'model': {'class': 'LGBModel', 'module_path': 'qlib.contrib.model.gbdt', 'kwargs': {'loss': 'mse', 'colsample_bytree': 0.8879, 'learning_rate': 0.2, 'subsample': 0.8789, 'lambda_l1': 205.6999, 'lambda_l2': 580.9768, 'max_depth': 8, 'num_leaves': 210, 'num_threads': 20}}, 'dataset': {'class': 'DatasetH', 'module_path': 'qlib.data.dataset', 'kwargs': {'handler': {'class': 'Alpha158', 'module_path': 'qlib.contrib.data.handler', 'kwargs': {'start_time': datetime.date(2008, 1, 1), 'end_time': datetime.date(2020, 8, 1), 'fit_start_time': datetime.date(2008, 1, 1), 'fit_end_time': datetime.date(2014, 12, 31), 'instruments': 'csi300', 'label': ['Ref($close, -21) / Ref($close, -1) - 1']}}, 'segments': {'train': [datetime.date(2008, 1, 1), datetime.date(2014, 12, 31)], 'valid': [datetime.date(2015, 1, 1), datetime.date(2016, 12, 31)], 'test': [datetime.date(2017, 1, 1), datetime.date(2020, 8, 1)]}}}, 'record': [{'class': 'SignalRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'model': '<MODEL>', 'dataset': '<DATASET>'}}, {'class': 'SigAnaRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'ana_long_short': False, 'ann_scaler': 252}}, {'class': 'PortAnaRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'config': {'strategy': {'class': 'TopkDropoutStrategy', 'module_path': 'qlib.contrib.strategy', 'kwargs': {'signal': '<PRED>', 'topk': 50, 'n_drop': 5}}, 'backtest': {'start_time': datetime.date(2017, 1, 1), 'end_time': datetime.date(2020, 8, 1), 'account': 100000000, 'benchmark': 'SH000300', 'exchange_kwargs': {'limit_threshold': 0.095, 'deal_price': 'close', 'open_cost': 0.0005, 'close_cost': 0.0015, 'min_cost': 5}}}}}]}
[761674:MainThread](2024-03-14 23:45:35,145) INFO - qlib.workflow - [exp.py:258] - Experiment 1 starts running ...
[761674:MainThread](2024-03-14 23:45:35,185) INFO - qlib.workflow - [recorder.py:341] - Recorder 26a7facbe4b742e5ae1c4c29c90336f3 starts running under Experiment 1 ...
ModuleNotFoundError. CatBoostModel are skipped. (optional: maybe installing CatBoostModel can fix it.)
ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it).
Training until validation scores don't improve for 50 rounds
[20]    train's l2: 0.959367    valid's l2: 0.992761
[40]    train's l2: 0.941031    valid's l2: 0.996238
[60]    train's l2: 0.92202     valid's l2: 0.999542
Early stopping, best iteration is:
[12]    train's l2: 0.96859     valid's l2: 0.992723
[761674:MainThread](2024-03-14 23:45:40,958) INFO - qlib.timer - [log.py:127] - Time cost: 0.017s | waiting `async_log` Done
[761674:MainThread](2024-03-14 23:45:41,116) INFO - qlib.Rolling - [base.py:162] - The prediction horizon is overrided
[761674:MainThread](2024-03-14 23:45:41,116) INFO - qlib.Rolling - [base.py:173] - {'model': {'class': 'LGBModel', 'module_path': 'qlib.contrib.model.gbdt', 'kwargs': {'loss': 'mse', 'colsample_bytree': 0.8879, 'learning_rate': 0.2, 'subsample': 0.8789, 'lambda_l1': 205.6999, 'lambda_l2': 580.9768, 'max_depth': 8, 'num_leaves': 210, 'num_threads': 20}}, 'dataset': {'class': 'DatasetH', 'module_path': 'qlib.data.dataset', 'kwargs': {'handler': {'class': 'Alpha158', 'module_path': 'qlib.contrib.data.handler', 'kwargs': {'start_time': datetime.date(2008, 1, 1), 'end_time': datetime.date(2020, 8, 1), 'fit_start_time': datetime.date(2008, 1, 1), 'fit_end_time': datetime.date(2014, 12, 31), 'instruments': 'csi300', 'label': ['Ref($close, -21) / Ref($close, -1) - 1']}}, 'segments': {'train': [datetime.date(2008, 1, 1), datetime.date(2014, 12, 31)], 'valid': [datetime.date(2015, 1, 1), datetime.date(2016, 12, 31)], 'test': [datetime.date(2017, 1, 1), datetime.date(2020, 8, 1)]}}}, 'record': [{'class': 'SignalRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'model': '<MODEL>', 'dataset': '<DATASET>'}}, {'class': 'SigAnaRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'ana_long_short': False, 'ann_scaler': 252}}, {'class': 'PortAnaRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'config': {'strategy': {'class': 'TopkDropoutStrategy', 'module_path': 'qlib.contrib.strategy', 'kwargs': {'signal': '<PRED>', 'topk': 50, 'n_drop': 5}}, 'backtest': {'start_time': datetime.date(2017, 1, 1), 'end_time': datetime.date(2020, 8, 1), 'account': 100000000, 'benchmark': 'SH000300', 'exchange_kwargs': {'limit_threshold': 0.095, 'deal_price': 'close', 'open_cost': 0.0005, 'close_cost': 0.0015, 'min_cost': 5}}}}}]}
[761674:MainThread](2024-03-14 23:45:44,056) INFO - qlib.timer - [log.py:127] - Time cost: 0.021s | Loading data Done
[761674:MainThread](2024-03-14 23:45:44,056) INFO - qlib.timer - [log.py:127] - Time cost: 0.000s | fit & process data Done
[761674:MainThread](2024-03-14 23:45:44,056) INFO - qlib.timer - [log.py:127] - Time cost: 0.021s | Init data Done
[761674:MainThread](2024-03-14 23:45:44,144) INFO - qlib.Rolling - [base.py:162] - The prediction horizon is overrided
[761674:MainThread](2024-03-14 23:45:44,144) INFO - qlib.Rolling - [base.py:173] - {'model': {'class': 'LGBModel', 'module_path': 'qlib.contrib.model.gbdt', 'kwargs': {'loss': 'mse', 'colsample_bytree': 0.8879, 'learning_rate': 0.2, 'subsample': 0.8789, 'lambda_l1': 205.6999, 'lambda_l2': 580.9768, 'max_depth': 8, 'num_leaves': 210, 'num_threads': 20}}, 'dataset': {'class': 'DatasetH', 'module_path': 'qlib.data.dataset', 'kwargs': {'handler': {'class': 'Alpha158', 'module_path': 'qlib.contrib.data.handler', 'kwargs': {'start_time': datetime.date(2008, 1, 1), 'end_time': datetime.date(2020, 8, 1), 'fit_start_time': datetime.date(2008, 1, 1), 'fit_end_time': datetime.date(2014, 12, 31), 'instruments': 'csi300', 'label': ['Ref($close, -21) / Ref($close, -1) - 1']}}, 'segments': {'train': [datetime.date(2008, 1, 1), datetime.date(2014, 12, 31)], 'valid': [datetime.date(2015, 1, 1), datetime.date(2016, 12, 31)], 'test': [datetime.date(2017, 1, 1), datetime.date(2020, 8, 1)]}}}, 'record': [{'class': 'SignalRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'model': '<MODEL>', 'dataset': '<DATASET>'}}, {'class': 'SigAnaRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'ana_long_short': False, 'ann_scaler': 252}}, {'class': 'PortAnaRecord', 'module_path': 'qlib.workflow.record_temp', 'kwargs': {'config': {'strategy': {'class': 'TopkDropoutStrategy', 'module_path': 'qlib.contrib.strategy', 'kwargs': {'signal': '<PRED>', 'topk': 50, 'n_drop': 5}}, 'backtest': {'start_time': datetime.date(2017, 1, 1), 'end_time': datetime.date(2020, 8, 1), 'account': 100000000, 'benchmark': 'SH000300', 'exchange_kwargs': {'limit_threshold': 0.095, 'deal_price': 'close', 'open_cost': 0.0005, 'close_cost': 0.0015, 'min_cost': 5}}}}}]}
[761674:MainThread](2024-03-14 23:45:44,343) WARNING - qlib.data - [data.py:666] - load calendar error: freq=day, future=True; return current calendar!
[761674:MainThread](2024-03-14 23:45:44,343) WARNING - qlib.data - [data.py:669] - You can get future calendar by referring to the following document: https://github.com/microsoft/qlib/blob/main/scripts/data_collector/contrib/README.md
[761674:MainThread](2024-03-14 23:45:44,366) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[AssertionError: An empty experiment is required for setup `InternalData`].
  File "workflow.py", line 40, in <module>
    fire.Fire(DDGDABench)
  File "/home/longc/anaconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/longc/anaconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/longc/anaconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/longc/anaconda3/envs/qlib/lib/python3.8/site-packages/qlib/contrib/rolling/ddgda.py", line 335, in run
    self._dump_meta_ipt()
  File "/home/longc/anaconda3/envs/qlib/lib/python3.8/site-packages/qlib/contrib/rolling/ddgda.py", line 213, in _dump_meta_ipt
    internal_data.setup(trainer=TrainerR)
  File "/home/longc/anaconda3/envs/qlib/lib/python3.8/site-packages/qlib/contrib/meta/data_selection/dataset.py", line 84, in setup
    assert 0 == len(recorders), "An empty experiment is required for setup `InternalData`"
AssertionError: An empty experiment is required for setup `InternalData`

I met this empty recorders error here. I struggled to check the code but I did not have a clue now.

Below is the version of my packages relatively

(qlib) [longc@arch DDG-DA]$ python3
Python 3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:21:28) 
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import qlib
>>> import pandas as pd
>>> import numpy as np
>>> import torch

>>> print("Qlib version:", qlib.__version__)
Qlib version: 0.9.3
>>> print("Pandas version:", pd.__version__)
Pandas version: 1.5.3
>>> print("NumPy version:", np.__version__)
NumPy version: 1.23.5
>>> print("PyTorch version:", torch.__version__)
PyTorch version: 1.11.0+cu113

I really appreciate any help if possible. Thanks!!!

### Tasks
ZhongHaoAustin commented 3 months ago

Try to remove the mlrun dir using rm -rf mlrun in the examples/benchmarks_dynamic/DDG-DA

l0ngc commented 3 months ago

Try to remove the mlrun dir using rm -rf mlrun in the examples/benchmarks_dynamic/DDG-DA

Problem Solved! Thank you very much!