microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.4k stars 2.63k forks source link

PortAnaRecord - calendar not exists for freq day #237

Open ChengYen-Tang opened 3 years ago

ChengYen-Tang commented 3 years ago

🐛 Bug Description

https://github.com/microsoft/qlib/issues/231 問題差不多,使用 30m 的數據會出現 calendar not exists for freq day 的錯誤

To Reproduce

Steps to reproduce the behavior:

  1. 載入資料
    
    import time
    import numpy as np
    import pandas as pd

import qlib from qlib.config import REG_US from qlib.contrib.model.gbdt import LGBModel from qlib.contrib.data.handler import Alpha158 from qlib.contrib.strategy.strategy import TopkDropoutStrategy from qlib.contrib.evaluate import ( backtest as normal_backtest, risk_analysis ) from qlib.utils import exists_qlib_data, init_instance_by_config from qlib.workflow import R from qlib.workflow.record_temp import SignalRecord, PortAnaRecord from qlib.utils import flatten_dict from qlib.data import D

qlib.init(provider_uri='~/.qlib/qlib_data/my_data/') instruments = D.instruments(market='all')

[18021:MainThread](2021-01-31 23:42:01,889) INFO - qlib.Initialization - [config.py:277] - default_conf: client. [18021:MainThread](2021-01-31 23:42:01,896) WARNING - qlib.Initialization - [config.py:292] - redis connection failed(host=127.0.0.1 port=6379), cache will not be used! [18021:MainThread](2021-01-31 23:42:01,900) INFO - qlib.Initialization - [init.py:46] - qlib successfully initialized based on client settings. [18021:MainThread](2021-01-31 23:42:01,901) INFO - qlib.Initialization - [init.py:47] - data_path=/home/kenneth/.qlib/qlib_data/my_data

2. 定義訓練參數

data_handler_config = { 'start_time': '2017-07-15', 'end_time': '2021-01-15', 'fit_start_time': '2017-07-15', 'fit_end_time': '2020-06-30', 'instruments': instruments, 'freq': '30m' }

task = { 'model': { 'class': 'LGBModel', 'module_path': 'qlib.contrib.model.gbdt', 'kwargs':{ 'loss': 'mse', 'colsample_bytree': 0.8879, 'learning_rate': 0.0421, 'subsample': 0.8789, 'lambda_l1': 205.6999, 'lambda_l2': 580.9768, 'max_depth': 8, 'num_leaves': 210, 'num_threads': 20 } }, 'dataset':{ 'class': 'DatasetH', 'module_path': 'qlib.data.dataset', 'kwargs':{ 'handler':{ 'class': 'Alpha158', 'module_path': 'qlib.contrib.data.handler', 'kwargs': data_handler_config }, 'segments':{ 'train': ('2017-07-15', '2020-01-01'), 'valid': ('2020-01-02', '2020-06-30'), 'test': ('2020-07-07', '2021-01-15'), } } } } model = init_instance_by_config(task['model']) dataset = init_instance_by_config(task['dataset'])

[18021:MainThread](2021-02-01 00:16:14,111) INFO - qlib.timer - [log.py:81] - Time cost: 46.901s | Loading data Done [18021:MainThread](2021-02-01 00:16:15,333) INFO - qlib.timer - [log.py:81] - Time cost: 1.017s | DropnaLabel Done [18021:MainThread](2021-02-01 00:21:03,812) INFO - qlib.timer - [log.py:81] - Time cost: 288.477s | CSZScoreNorm Done [18021:MainThread](2021-02-01 00:21:03,815) INFO - qlib.timer - [log.py:81] - Time cost: 289.700s | fit & process data Done [18021:MainThread](2021-02-01 00:21:03,816) INFO - qlib.timer - [log.py:81] - Time cost: 336.607s | Init data Done

3. 訓練模型

t_start = time.time()

with R.start(experiment_name='train_model'): R.log_params(**flatten_dict(task)) model.fit(dataset) R.save_objects(trained_model=model) rid = R.get_recorder().id

t_end = time.time() print('train model - Time count: %.3fs'%(t_end - t_start))

[18021:MainThread](2021-02-01 00:24:55,000) INFO - qlib.workflow - [expm.py:245] - No tracking URI is provided. Use the default tracking URI. [18021:MainThread](2021-02-01 00:24:55,014) INFO - qlib.workflow - [expm.py:168] - No valid experiment found. Create a new experiment with name train_model. [18021:MainThread](2021-02-01 00:24:55,022) INFO - qlib.workflow - [exp.py:181] - Experiment 1 starts running ... [18021:MainThread](2021-02-01 00:24:55,213) INFO - qlib.workflow - [recorder.py:233] - Recorder 03e37d24ab8b4c809b619bdfecad8c78 starts running under Experiment 1 ... Training until validation scores don't improve for 50 rounds [20] train's l2: 0.891066 valid's l2: 0.94816 [40] train's l2: 0.889417 valid's l2: 0.948044 [60] train's l2: 0.888093 valid's l2: 0.948017 [80] train's l2: 0.886899 valid's l2: 0.948024 [100] train's l2: 0.885763 valid's l2: 0.948017 [120] train's l2: 0.884643 valid's l2: 0.948036 Early stopping, best iteration is: [87] train's l2: 0.886497 valid's l2: 0.947999 train model - Time count: 34.624s

4. 回測

port_analysis_config = { 'strategy':{ 'class': 'TopkDropoutStrategy', 'module_path': 'qlib.contrib.strategy.strategy', 'kwargs':{ 'topk': 50, 'n_drop': 5 } }, 'backtest':{ 'verbose': False, 'limit_threshold': np.inf, 'account': 100000000, 'benchmark': 'btcusdt-futuresusdt', 'deal_price': 'close', 'open_cost': 0.1, 'close_cost': 0.1, 'min_cost': 1, } }

t_start = time.time()

with R.start(experiment_name='backtest_analysis'): recorder = R.get_recorder(rid, experiment_name='train_model') model = recorder.load_object('trained_model')

# 預測
recorder = R.get_recorder()
ba_rid = recorder.id
sr = SignalRecord(model, dataset, recorder)
sr.generate()

# 回測和分析
par = PortAnaRecord(recorder, port_analysis_config)
par.generate()

t_end = time.time() print('backtest and analysis - Time count: %.3fs'%(t_end - t_start))

Error message:

[18021:MainThread](2021-02-01 01:10:40,088) INFO - qlib.workflow - [expm.py:245] - No tracking URI is provided. Use the default tracking URI. [18021:MainThread](2021-02-01 01:10:40,097) INFO - qlib.workflow - [exp.py:181] - Experiment 2 starts running ... [18021:MainThread](2021-02-01 01:10:40,127) INFO - qlib.workflow - [recorder.py:233] - Recorder 346169cbdcca4617abb3efd3e38d82c6 starts running under Experiment 2 ... [18021:MainThread](2021-02-01 01:10:41,141) INFO - qlib.workflow - [record_temp.py:125] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 2 [18021:MainThread](2021-02-01 01:10:41,240) INFO - qlib.backtest caller - [init.py:148] - Create new exchange 'The following are prediction results of the LGBModel model.' score datetime instrument
2020-07-07 ADABTC-SPOT -0.012265 ADAUSDT-FUTURESUSDT -0.011834 ADAUSDT-SPOT -0.034770 BCHBTC-SPOT 0.000807 BCHUSDT-FUTURESUSDT -0.023978

ValueError Traceback (most recent call last)

in 36 # 回測和分析 37 par = PortAnaRecord(recorder, port_analysis_config) ---> 38 par.generate() 39 40 t_end = time.time() ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/workflow/record_temp.py in generate(self, **kwargs) 241 # custom strategy and get backtest 242 pred_score = super().load() --> 243 report_dict = normal_backtest(pred_score, strategy=self.strategy, **self.backtest_config) 244 report_normal = report_dict.get("report_df") 245 positions_normal = report_dict.get("positions") ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/contrib/backtest/__init__.py in backtest(pred, account, shift, benchmark, verbose, return_order, **kwargs) 301 spec = inspect.getfullargspec(get_exchange) 302 ex_args = {k: v for k, v in kwargs.items() if k in spec.args} --> 303 trade_exchange = get_exchange(pred, **ex_args) 304 305 # init executor: ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/contrib/backtest/__init__.py in get_exchange(pred, exchange, subscribe_fields, open_cost, close_cost, min_cost, trade_unit, limit_threshold, deal_price, extract_codes, shift) 156 157 dates = sorted(pred.index.get_level_values("datetime").unique()) --> 158 dates = np.append(dates, get_date_range(dates[-1], left_shift=1, right_shift=shift)) 159 160 exchange = Exchange( ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/utils/__init__.py in get_date_range(trading_date, left_shift, right_shift, future) 488 from ..data import D 489 --> 490 start = get_date_by_shift(trading_date, left_shift, future=future) 491 end = get_date_by_shift(trading_date, right_shift, future=future) 492 ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/utils/__init__.py in get_date_by_shift(trading_date, shift, future, clip_shift) 508 from qlib.data import D 509 --> 510 cal = D.calendar(future=future) 511 if pd.to_datetime(trading_date) not in list(cal): 512 raise ValueError("{} is not trading day!".format(str(trading_date))) ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in calendar(self, start_time, end_time, freq, future) 929 930 def calendar(self, start_time=None, end_time=None, freq="day", future=False): --> 931 return Cal.calendar(start_time, end_time, freq, future=future) 932 933 def instruments(self, market="all", filter_pipe=None, start_time=None, end_time=None): ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in calendar(self, start_time, end_time, freq, future) 532 533 def calendar(self, start_time=None, end_time=None, freq="day", future=False): --> 534 _calendar, _calendar_index = self._get_calendar(freq, future) 535 if start_time == "None": 536 start_time = None ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in _get_calendar(self, freq, future) 118 _calendar, _calendar_index = H["c"][flag] 119 else: --> 120 _calendar = np.array(self.load_calendar(freq, future)) 121 _calendar_index = {x: i for i, x in enumerate(_calendar)} # for fast search 122 H["c"][flag] = _calendar, _calendar_index ~/.local/lib/python3.8/site-packages/pyqlib-0.6.1.99-py3.8-linux-x86_64.egg/qlib/data/data.py in load_calendar(self, freq, future) 527 fname = self._uri_cal.format(freq) 528 if not os.path.exists(fname): --> 529 raise ValueError("calendar not exists for freq " + freq) 530 with open(fname) as f: 531 return [pd.Timestamp(x.strip()) for x in f] ValueError: calendar not exists for freq day ``` ## Environment **Note**: User could run `cd scripts && python collect_info.py all` under project directory to get system information and paste them here directly. - Qlib version: - Python version: 3.8.7 - OS (`Windows`, `Linux`, `MacOS`): Linux - Commit number (optional, please provide it if you are using the dev version): https://github.com/microsoft/qlib/commit/c0e7cbc9830c1149c7ef0823553f3b15a0936df1
you-n-g commented 3 years ago

@ChengYen-Tang This bug is fixed in https://github.com/microsoft/qlib/pull/234

The frequency paramter will be only used in the dataloader in the future

ChengYen-Tang commented 3 years ago

@you-n-g So PortAnaRecord does not support frequency paramter?

you-n-g commented 3 years ago

@ChengYen-Tang No, PortAnaRecord does not support frequency paramters so far. We are focus on developing this feature

Thanks