microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.5k stars 2.64k forks source link

PortAnaRecord back test fail for highfreq data #1122

Open 2young-2simple-sometimes-naive opened 2 years ago

2young-2simple-sometimes-naive commented 2 years ago

🐛 Bug Description

I am back testing data of 30min interval. The PortAnaRecord module generate the error below.

To Reproduce

Steps to reproduce the behavior:

with R.start(experiment_name="train_model"):
    recorder = R.get_recorder()
    rid = recorder.id
    print("RID: " + rid)
    model.fit(dataset)
    R.save_objects(trained_model=model)
    # prediction
    sr = SignalRecord(model, dataset, recorder)
    sr.generate()
    # prediction
    sig = SigAnaRecord(recorder, ana_long_short=True, ann_scaler=252, skip_existing=False)
    sig.generate()
    # backtest
    par = PortAnaRecord(recorder, port_analysis_config, risk_analysis_freq="day", indicator_analysis_freq="day")
    par.generate()

Expected Behavior

Perform back test

Screenshot

[33234:MainThread](2022-06-10 12:09:26,179) INFO - qlib.timer - [log.py:113] - Time cost: 22.111s | fit & process data Done
[33234:MainThread](2022-06-10 12:09:26,180) INFO - qlib.timer - [log.py:113] - Time cost: 104.391s | Init data Done
[33234:MainThread](2022-06-10 12:09:37,809) INFO - qlib.workflow - [expm.py:315] - <mlflow.tracking.client.MlflowClient object at 0x1554dcd826d0>
[33234:MainThread](2022-06-10 12:09:37,875) INFO - qlib.workflow - [exp.py:257] - Experiment 1 starts running ...
[33234:MainThread](2022-06-10 12:09:38,267) INFO - qlib.workflow - [recorder.py:293] - Recorder f9b0c9f7bdbf49c8b65eb37a57ba1ab0 starts running under Experiment 1 ...
/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/contrib/model/highfreq_gdbt_model.py:93: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.median(level=1) should use df.groupby(level=1).median().
  df_train["label"][l_name] = df_train["label"][l_name] - df_train["label"][l_name].mean(level=0)
/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/contrib/model/highfreq_gdbt_model.py:93: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_train["label"][l_name] = df_train["label"][l_name] - df_train["label"][l_name].mean(level=0)
/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/contrib/model/highfreq_gdbt_model.py:94: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.median(level=1) should use df.groupby(level=1).median().
  df_valid["label"][l_name] = df_valid["label"][l_name] - df_valid["label"][l_name].mean(level=0)
/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/contrib/model/highfreq_gdbt_model.py:94: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid["label"][l_name] = df_valid["label"][l_name] - df_valid["label"][l_name].mean(level=0)
[33234:MainThread](2022-06-10 12:41:35,649) INFO - qlib.workflow - [record_temp.py:194] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 1
[33234:MainThread](2022-06-10 12:41:55,350) INFO - qlib.timer - [log.py:113] - Time cost: 0.000s | waiting `async_log` Done
[33234:MainThread](2022-06-10 12:41:55,351) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: can't find a freq from [Freq(30min)] that can resample to 1min!].
  File "./wfhf.py", line 218, in <module>
    par.generate()
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/workflow/record_temp.py", line 232, in generate
    return self._generate(*args, **kwargs)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/workflow/record_temp.py", line 468, in _generate
    portfolio_metric_dict, indicator_dict = normal_backtest(
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/__init__.py", line 245, in backtest
    trade_strategy, trade_executor = get_strategy_executor(
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/__init__.py", line 178, in get_strategy_executor
    trade_account = create_account_instance(
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/__init__.py", line 158, in create_account_instance
    return Account(**kwargs)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/account.py", line 103, in __init__
    self.init_vars(init_cash, position_dict, freq, benchmark_config)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/account.py", line 124, in init_vars
    self.reset(freq=freq, benchmark_config=benchmark_config)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/account.py", line 166, in reset
    self.reset_report(self.freq, self.benchmark_config)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/account.py", line 137, in reset_report
    self.portfolio_metrics = PortfolioMetrics(freq, benchmark_config)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/report.py", line 70, in __init__
    self.init_bench(freq=freq, benchmark_config=benchmark_config)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/report.py", line 88, in init_bench
    self.bench = self._cal_benchmark(self.benchmark_config, self.freq)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/backtest/report.py", line 107, in _cal_benchmark
    _temp_result, _ = get_higher_eq_freq_feature(_codes, fields, start_time, end_time, freq=freq)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/utils/resam.py", line 92, in get_higher_eq_freq_feature
    _result = D.features(instruments, fields, start_time, end_time, freq="1min", disk_cache=disk_cache)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 1189, in features
    return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 915, in dataset
    cal = Cal.calendar(start_time, end_time, freq)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 91, in calendar
    _calendar, _calendar_index = self._get_calendar(freq, future)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 172, in _get_calendar
    _calendar = np.array(self.load_calendar(freq, future))
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/data.py", line 662, in load_calendar
    backend_obj = self.backend_obj(freq=freq, future=future).data
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/storage/file_storage.py", line 124, in data
    self.check()
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/storage/file_storage.py", line 72, in check
    if not self.uri.exists():
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/storage/file_storage.py", line 120, in uri
    return self.dpm.get_data_uri(self._freq_file).joinpath(f"{self.storage_name}s", self.file_name)
  File "/dataxxxxxxxxxx/.venv/lib/python3.8/site-packages/pyqlib-0.8.5.99-py3.8-linux-x86_64.egg/qlib/data/storage/file_storage.py", line 101, in _freq_file
    raise ValueError(f"can't find a freq from {self.support_freq} that can resample to {self.freq}!")

Environment

Linux
x86_64
Linux-4.18.0-147.el8.x86_64-x86_64-with-glibc2.2.5
#1 SMP Wed Dec 4 21:51:45 UTC 2019

Python version: 3.8.6 (default, Oct 22 2020, 17:03:03)  [GCC 9.3.0]

Qlib version: 0.8.5.99
numpy==1.22.3
pandas==1.3.5
scipy==1.8.1
requests==2.27.1
sacred==0.8.2
python-socketio==5.6.0
redis==4.3.1
python-redis-lock==3.7.0
schedule==1.1.0
cvxpy==1.2.1
hyperopt==0.1.2
fire==0.4.0
statsmodels==0.13.2
xlrd==2.0.1
plotly==5.8.0
matplotlib==3.5.2
tables==3.7.0
pyyaml==6.0
mlflow==1.26.0
tqdm==4.64.0
loguru==0.6.0
lightgbm==3.3.2
tornado==6.1
joblib==1.1.0
fire==0.4.0
ruamel.yaml==0.17.21

Additional Notes

albertcity commented 1 year ago

@you-n-g this bug still remains.