PortAnaRecord - calendar not exists for freq day

ChengYen-Tang commented 3 years ago

🐛 Bug Description

與 https://github.com/microsoft/qlib/issues/231 問題差不多，使用 30m 的數據會出現 calendar not exists for freq day 的錯誤

To Reproduce

Steps to reproduce the behavior:

載入資料


import time
import numpy as np
import pandas as pd

import qlib from qlib.config import REG_US from qlib.contrib.model.gbdt import LGBModel from qlib.contrib.data.handler import Alpha158 from qlib.contrib.strategy.strategy import TopkDropoutStrategy from qlib.contrib.evaluate import ( backtest as normal_backtest, risk_analysis ) from qlib.utils import exists_qlib_data, init_instance_by_config from qlib.workflow import R from qlib.workflow.record_temp import SignalRecord, PortAnaRecord from qlib.utils import flatten_dict from qlib.data import D

qlib.init(provider_uri='~/.qlib/qlib_data/my_data/') instruments = D.instruments(market='all')

[18021:MainThread](2021-01-31 23:42:01,889) INFO - qlib.Initialization - [config.py:277] - default_conf: client. [18021:MainThread](2021-01-31 23:42:01,896) WARNING - qlib.Initialization - [config.py:292] - redis connection failed(host=127.0.0.1 port=6379), cache will not be used! [18021:MainThread](2021-01-31 23:42:01,900) INFO - qlib.Initialization - [init.py:46] - qlib successfully initialized based on client settings. [18021:MainThread](2021-01-31 23:42:01,901) INFO - qlib.Initialization - [init.py:47] - data_path=/home/kenneth/.qlib/qlib_data/my_data

2. 定義訓練參數

data_handler_config = { 'start_time': '2017-07-15', 'end_time': '2021-01-15', 'fit_start_time': '2017-07-15', 'fit_end_time': '2020-06-30', 'instruments': instruments, 'freq': '30m' }

task = { 'model': { 'class': 'LGBModel', 'module_path': 'qlib.contrib.model.gbdt', 'kwargs':{ 'loss': 'mse', 'colsample_bytree': 0.8879, 'learning_rate': 0.0421, 'subsample': 0.8789, 'lambda_l1': 205.6999, 'lambda_l2': 580.9768, 'max_depth': 8, 'num_leaves': 210, 'num_threads': 20 } }, 'dataset':{ 'class': 'DatasetH', 'module_path': 'qlib.data.dataset', 'kwargs':{ 'handler':{ 'class': 'Alpha158', 'module_path': 'qlib.contrib.data.handler', 'kwargs': data_handler_config }, 'segments':{ 'train': ('2017-07-15', '2020-01-01'), 'valid': ('2020-01-02', '2020-06-30'), 'test': ('2020-07-07', '2021-01-15'), } } } } model = init_instance_by_config(task['model']) dataset = init_instance_by_config(task['dataset'])

[18021:MainThread](2021-02-01 00:16:14,111) INFO - qlib.timer - [log.py:81] - Time cost: 46.901s | Loading data Done [18021:MainThread](2021-02-01 00:16:15,333) INFO - qlib.timer - [log.py:81] - Time cost: 1.017s | DropnaLabel Done [18021:MainThread](2021-02-01 00:21:03,812) INFO - qlib.timer - [log.py:81] - Time cost: 288.477s | CSZScoreNorm Done [18021:MainThread](2021-02-01 00:21:03,815) INFO - qlib.timer - [log.py:81] - Time cost: 289.700s | fit & process data Done [18021:MainThread](2021-02-01 00:21:03,816) INFO - qlib.timer - [log.py:81] - Time cost: 336.607s | Init data Done

3. 訓練模型

t_start = time.time()

with R.start(experiment_name='train_model'): R.log_params(**flatten_dict(task)) model.fit(dataset) R.save_objects(trained_model=model) rid = R.get_recorder().id

t_end = time.time() print('train model - Time count: %.3fs'%(t_end - t_start))

[18021:MainThread](2021-02-01 00:24:55,000) INFO - qlib.workflow - [expm.py:245] - No tracking URI is provided. Use the default tracking URI. [18021:MainThread](2021-02-01 00:24:55,014) INFO - qlib.workflow - [expm.py:168] - No valid experiment found. Create a new experiment with name train_model. [18021:MainThread](2021-02-01 00:24:55,022) INFO - qlib.workflow - [exp.py:181] - Experiment 1 starts running ... [18021:MainThread](2021-02-01 00:24:55,213) INFO - qlib.workflow - [recorder.py:233] - Recorder 03e37d24ab8b4c809b619bdfecad8c78 starts running under Experiment 1 ... Training until validation scores don't improve for 50 rounds [20] train's l2: 0.891066 valid's l2: 0.94816 [40] train's l2: 0.889417 valid's l2: 0.948044 [60] train's l2: 0.888093 valid's l2: 0.948017 [80] train's l2: 0.886899 valid's l2: 0.948024 [100] train's l2: 0.885763 valid's l2: 0.948017 [120] train's l2: 0.884643 valid's l2: 0.948036 Early stopping, best iteration is: [87] train's l2: 0.886497 valid's l2: 0.947999 train model - Time count: 34.624s

4. 回測

port_analysis_config = { 'strategy':{ 'class': 'TopkDropoutStrategy', 'module_path': 'qlib.contrib.strategy.strategy', 'kwargs':{ 'topk': 50, 'n_drop': 5 } }, 'backtest':{ 'verbose': False, 'limit_threshold': np.inf, 'account': 100000000, 'benchmark': 'btcusdt-futuresusdt', 'deal_price': 'close', 'open_cost': 0.1, 'close_cost': 0.1, 'min_cost': 1, } }

t_start = time.time()

with R.start(experiment_name='backtest_analysis'): recorder = R.get_recorder(rid, experiment_name='train_model') model = recorder.load_object('trained_model')

# 預測
recorder = R.get_recorder()
ba_rid = recorder.id
sr = SignalRecord(model, dataset, recorder)
sr.generate()

# 回測和分析
par = PortAnaRecord(recorder, port_analysis_config)
par.generate()

t_end = time.time() print('backtest and analysis - Time count: %.3fs'%(t_end - t_start))

Error message:

[18021:MainThread](2021-02-01 01:10:40,088) INFO - qlib.workflow - [expm.py:245] - No tracking URI is provided. Use the default tracking URI. [18021:MainThread](2021-02-01 01:10:40,097) INFO - qlib.workflow - [exp.py:181] - Experiment 2 starts running ... [18021:MainThread](2021-02-01 01:10:40,127) INFO - qlib.workflow - [recorder.py:233] - Recorder 346169cbdcca4617abb3efd3e38d82c6 starts running under Experiment 2 ... [18021:MainThread](2021-02-01 01:10:41,141) INFO - qlib.workflow - [record_temp.py:125] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 2 [18021:MainThread](2021-02-01 01:10:41,240) INFO - qlib.backtest caller - [init.py:148] - Create new exchange 'The following are prediction results of the LGBModel model.' score datetime instrument
2020-07-07 ADABTC-SPOT -0.012265 ADAUSDT-FUTURESUSDT -0.011834 ADAUSDT-SPOT -0.034770 BCHBTC-SPOT 0.000807 BCHUSDT-FUTURESUSDT -0.023978

ValueError Traceback (most recent call last)

in 36 # 回測和分析 37 par = PortAnaRecord(recorder, port_analysis_config) ---> 38 par.generate() 39 40 t_end = time.time() ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/workflow/record_temp.py in generate(self, **kwargs) 241 # custom strategy and get backtest 242 pred_score = super().load() --> 243 report_dict = normal_backtest(pred_score, strategy=self.strategy, **self.backtest_config) 244 report_normal = report_dict.get("report_df") 245 positions_normal = report_dict.get("positions") ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/contrib/backtest/__init__.py in backtest(pred, account, shift, benchmark, verbose, return_order, **kwargs) 301 spec = inspect.getfullargspec(get_exchange) 302 ex_args = {k: v for k, v in kwargs.items() if k in spec.args} --> 303 trade_exchange = get_exchange(pred, **ex_args) 304 305 # init executor: ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/contrib/backtest/__init__.py in get_exchange(pred, exchange, subscribe_fields, open_cost, close_cost, min_cost, trade_unit, limit_threshold, deal_price, extract_codes, shift) 156 157 dates = sorted(pred.index.get_level_values("datetime").unique()) --> 158 dates = np.append(dates, get_date_range(dates[-1], left_shift=1, right_shift=shift)) 159 160 exchange = Exchange( ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/utils/__init__.py in get_date_range(trading_date, left_shift, right_shift, future) 488 from ..data import D 489 --> 490 start = get_date_by_shift(trading_date, left_shift, future=future) 491 end = get_date_by_shift(trading_date, right_shift, future=future) 492 ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/utils/__init__.py in get_date_by_shift(trading_date, shift, future, clip_shift) 508 from qlib.data import D 509 --> 510 cal = D.calendar(future=future) 511 if pd.to_datetime(trading_date) not in list(cal): 512 raise ValueError("{} is not trading day!".format(str(trading_date))) ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in calendar(self, start_time, end_time, freq, future) 929 930 def calendar(self, start_time=None, end_time=None, freq="day", future=False): --> 931 return Cal.calendar(start_time, end_time, freq, future=future) 932 933 def instruments(self, market="all", filter_pipe=None, start_time=None, end_time=None): ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in calendar(self, start_time, end_time, freq, future) 532 533 def calendar(self, start_time=None, end_time=None, freq="day", future=False): --> 534 _calendar, _calendar_index = self._get_calendar(freq, future) 535 if start_time == "None": 536 start_time = None ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in _get_calendar(self, freq, future) 118 _calendar, _calendar_index = H["c"][flag] 119 else: --> 120 _calendar = np.array(self.load_calendar(freq, future)) 121 _calendar_index = {x: i for i, x in enumerate(_calendar)} # for fast search 122 H["c"][flag] = _calendar, _calendar_index ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in load_calendar(self, freq, future) 527 fname = self._uri_cal.format(freq) 528 if not os.path.exists(fname): --> 529 raise ValueError("calendar not exists for freq " + freq) 530 with open(fname) as f: 531 return [pd.Timestamp(x.strip()) for x in f] ValueError: calendar not exists for freq day ``` ## Environment **Note**: User could run `cd scripts && python collect_info.py all` under project directory to get system information and paste them here directly. - Qlib version: - Python version: 3.8.7 - OS (`Windows`, `Linux`, `MacOS`): Linux - Commit number (optional, please provide it if you are using the dev version): https://github.com/microsoft/qlib/commit/c0e7cbc9830c1149c7ef0823553f3b15a0936df1

you-n-g commented 3 years ago

@ChengYen-Tang This bug is fixed in https://github.com/microsoft/qlib/pull/234

The frequency paramter will be only used in the dataloader in the future

ChengYen-Tang commented 3 years ago

@you-n-g So PortAnaRecord does not support frequency paramter?

you-n-g commented 3 years ago

@ChengYen-Tang No, PortAnaRecord does not support frequency paramters so far. We are focus on developing this feature

Thanks

microsoft / qlib

PortAnaRecord - calendar not exists for freq day #237

🐛 Bug Description

To Reproduce