microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.53k stars 2.64k forks source link

benchmark怎么填如果是用的us数据? #720

Closed boundles closed 2 years ago

boundles commented 2 years ago

如上,谢谢。

zhupr commented 2 years ago

@boundles Hi, SP500 or NASDAQ100. You can also get more: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/us_index

boundles commented 2 years ago

@boundles Hi, SP500 or NASDAQ100. You can also get more: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/us_index

File "/Users/didi/opt/miniconda3/lib/python3.8/site-packages/pyqlib-0.7.2.99-py3.8-macosx-10.9-x86_64.egg/qlib/backtest/report.py", line 73, in init self.init_bench(freq=freq, benchmark_config=benchmark_config) File "/Users/didi/opt/miniconda3/lib/python3.8/site-packages/pyqlib-0.7.2.99-py3.8-macosx-10.9-x86_64.egg/qlib/backtest/report.py", line 91, in init_bench self.bench = self._cal_benchmark(self.benchmark_config, self.freq) File "/Users/didi/opt/miniconda3/lib/python3.8/site-packages/pyqlib-0.7.2.99-py3.8-macosx-10.9-x86_64.egg/qlib/backtest/report.py", line 112, in _cal_benchmark raise ValueError(f"The benchmark {_codes} does not exist. Please provide the right benchmark") ValueError: The benchmark ['NASDAQ-100'] does not exist. Please provide the right benchmark

改成NASDAQ-100之后报了这个错误,回测配置如下: "backtest": { "start_time": "2017-01-01", "end_time": "2020-08-01", "account": 100000000, "benchmark": "NASDAQ-100", "exchange_kwargs": { "freq": "day", "limit_threshold": 0.095, "deal_price": "close", "open_cost": 0.0005, "close_cost": 0.0015, "min_cost": 5, }, },

zhupr commented 2 years ago

Sorry, benchmark should be written: ^gspc or ^ndx or ^dji; the correspondence between benchmarket and market is:

boundles commented 2 years ago

"open_cost": 0.0005, "close_cost": 0.0015, "min_cost": 5,

谢谢,可以跑通了,再问一下,open cost和close cost还有min cost一般是怎么设置的呢?

boundles commented 2 years ago

Sorry, benchmark should be written: ^gspc or ^ndx or ^dji; the correspondence between benchmarket and market is:

  • market=SP500, benchmarket=^gspc
  • marekt=NASDAQ100, benchmark=^ndx

`# Copyright (c) Microsoft Corporation.

Licensed under the MIT License.

import qlib from qlib.data import D from qlib.config import REG_CN, REG_US from qlib.workflow import R from qlib.workflow.record_temp import SignalRecord, PortAnaRecord from qlib.utils import init_instance_by_config from qlib.contrib.data.handler import DataHandlerLP from qlib.contrib.model import DEnsembleModel from qlib.tests.data import GetData from qlib.tests.config import CSI300_BENCH, CSI300_GBDT_TASK

NASDAQ100_MARKET = "nasdaq100"

def get_data_handler_config( start_time="2008-01-01", end_time="2020-08-01", fit_start_time="2008-01-01", fit_end_time="2014-12-31", instruments=NASDAQ100_MARKET, ): return { "start_time": start_time, "end_time": end_time, "fit_start_time": fit_start_time, "fit_end_time": fit_end_time, "instruments": instruments, }

def get_dataset_config( dataset_class="Alpha158", train=("2008-01-01", "2014-12-31"), valid=("2015-01-01", "2016-12-31"), test=("2017-01-01", "2020-08-01"), handler_kwargs={"instruments": NASDAQ100_MARKET}, ): return { "class": "DatasetH", "module_path": "qlib.data.dataset", "kwargs": { "handler": { "class": dataset_class, "module_path": "qlib.contrib.data.handler", "kwargs": get_data_handler_config(**handler_kwargs), }, "segments": { "train": train, "valid": valid, "test": test, }, }, }

def get_gbdt_dataset(dataset_kwargs={}, handler_kwargs={"instruments": NASDAQ100_MARKET}): return get_dataset_config(**dataset_kwargs, handler_kwargs=handler_kwargs)

if name == "main":

use us data

provider_uri = "~/.qlib/qlib_data/us_data"  # target_dir
# GetData().qlib_data(target_dir=provider_uri, region=REG_US, exists_skip=True)
qlib.init(provider_uri=provider_uri, region=REG_US)

# define dataset
dataset = init_instance_by_config(get_gbdt_dataset())
df_train, df_valid = dataset.prepare(
    ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
)
x_train, y_train = df_train["feature"], df_train["label"]

# define model
model = DEnsembleModel(base_model="gbm",
                       loss="mse",
                       num_models=6,
                       enable_sr=True,
                       enable_fs=True,
                       alpha1=1.0,
                       alpha2=1.0,
                       bins_sr=10,
                       bins_fs=5,
                       decay=0.5,
                       sample_ratios=[0.8, 0.7, 0.6, 0.5, 0.4],
                       sub_weights=[1, 0.2, 0.2, 0.2, 0.2, 0.2],
                       epochs=28,
                       colsample_bytree=0.8879,
                       learning_rate=0.2,
                       subsample=0.8789,
                       lambda_l1=205.6999,
                       lambda_l2=580.9768,
                       max_depth=8,
                       num_leaves=210,
                       num_threads=20)

port_analysis_config = {
    "executor": {
        "class": "SimulatorExecutor",
        "module_path": "qlib.backtest.executor",
        "kwargs": {
            "time_per_step": "day",
            "generate_portfolio_metrics": True,
        },
    },
    "strategy": {
        "class": "TopkDropoutStrategy",
        "module_path": "qlib.contrib.strategy.signal_strategy",
        "kwargs": {
            "signal": (model, dataset),
            "topk": 50,
            "n_drop": 5,
        },
    },
    "backtest": {
        "start_time": "2017-01-01",
        "end_time": "2020-08-01",
        "account": 100000000,
        "benchmark": "^ndx",
        "exchange_kwargs": {
            "freq": "day",
            "limit_threshold": 0.095,
            "deal_price": "close",
            "open_cost": 0.0005,
            "close_cost": 0.0015,
            "min_cost": 5,
        },
    },
}

# start exp
with R.start(experiment_name="workflow"):
    model.fit(dataset)
    R.save_objects(**{"params.pkl": model})

    # prediction
    recorder = R.get_recorder()
    sr = SignalRecord(model, dataset, recorder)
    sr.generate()

    # backtest. If users want to use backtest based on their own prediction,
    # please refer to https://qlib.readthedocs.io/en/latest/component/recorder.html#record-template.
    par = PortAnaRecord(recorder, port_analysis_config, "day")
    par.generate()

`

Result: 'The following are analysis results of benchmark return(1day).' risk mean 0.000987 std 0.014722 annualized_return 0.234893 information_ratio 1.034224 max_drawdown -0.297210 'The following are analysis results of the excess return without cost(1day).' risk mean -0.000064 std 0.004405 annualized_return -0.015222 information_ratio -0.224014 max_drawdown -0.121643 'The following are analysis results of the excess return with cost(1day).' risk mean -0.000262 std 0.004405 annualized_return -0.062459 information_ratio -0.919101 max_drawdown -0.263510 'The following are analysis results of indicators(1day).' value ffr 1.0 pa 0.0 pos 0.0 [89836:MainThread](2021-11-30 19:38:46,403) INFO - qlib.timer - [log.py:113] - Time cost: 0.014s | waiting async_log Done

效果好像不行,跟benchmark差别比较大,请问代码有问题吗?或者你们有在us data上的例子吗,谢谢

zhupr commented 2 years ago

@boundles Hi, benchmark示例中的配置是针对A股的,比如A股有涨跌停限制(limit_threshold=0.095)等;而美股没有涨跌停(limit_threshold应该设置为None),并且model的参数也是针对A股的,如果用在美股,参数都需要重新设置,各个参数的释义在这里:https://qlib.readthedocs.io/en/latest/component/workflow.html

boundles commented 2 years ago

@boundles Hi, _benchmark_示例中的配置是针对A股的,比如A股有涨跌停限制(limit_threshold=0.095)等;而美股没有涨跌停(limit_threshold应该设置为None),并且model的参数也是针对A股的,如果用在美股,参数都需要重新设置,各个参数的释义在这里:https://qlib.readthedocs.io/en/latest/component/workflow.html

好的,谢谢

boundles commented 2 years ago

@boundles Hi, _benchmark_示例中的配置是针对A股的,比如A股有涨跌停限制(limit_threshold=0.095)等;而美股没有涨跌停(limit_threshold应该设置为None),并且model的参数也是针对A股的,如果用在美股,参数都需要重新设置,各个参数的释义在这里:https://qlib.readthedocs.io/en/latest/component/workflow.html

hi,你们有在美股数据上完整的示例吗?我试了几个模型参数效果都不怎么好,不确定是哪里设置错了还是模型效果本身就是这样?

zhupr commented 2 years ago

不好意思,美股的示例暂时还没有

boundles commented 2 years ago

不好意思,美股的示例暂时还没有

我调了下训练数据结果大概正常了些,再想问下下面这个benchmark结果的计算逻辑是什么? 'The following are analysis results of benchmark return(1day).' risk mean 0.002877 std 0.014513 annualized_return 0.684689 information_ratio 3.058170 max_drawdown -0.050106

boundles commented 2 years ago

不好意思,美股的示例暂时还没有

我调了下训练数据结果大概正常了些,再想问下下面这个benchmark结果的计算逻辑是什么?

'The following are analysis results of benchmark return(1day).' risk mean 0.002877 std 0.014513 annualized_return 0.684689 information_ratio 3.058170 max_drawdown -0.050106

zhupr commented 2 years ago

benchmark的计算逻辑在这里: https://github.com/microsoft/qlib/blob/main/qlib/contrib/evaluate.py#L24

boundles commented 2 years ago

benchmark的计算逻辑在这里: https://github.com/microsoft/qlib/blob/main/qlib/contrib/evaluate.py#L24

好像没看到benchmark这个策略的逻辑是什么?

zhupr commented 2 years ago

是这个指这个策略吗:https://github.com/microsoft/qlib/blob/main/qlib/contrib/strategy/signal_strategy.py#L20

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days

tAnGTaNgT commented 1 year ago

NASDAQ-

麻烦问下,为什么是这个写法呢

quant2008 commented 1 year ago

请问你怎么调的?我的美股绩效也是很差

l0ngc commented 7 months ago

请问你怎么调的?我的美股绩效也是很差

老哥,我美股的绩效也很差,请问您调出来了么,方便分享一下吗