Closed quant2008 closed 11 months ago
I see that they are in ops.py. But when I use alpha101, I get errors:
(qlib230510) G:\qlibtutor>E:/anaconda3/envs/qlib230510/python.exe g:/qlibtutor/advance/benchmarks_dynamic/baseline/my_rolling_benchmark.py
[23644:MainThread](2023-09-04 19:00:33,996) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[23644:MainThread](2023-09-04 19:00:33,998) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings.
[23644:MainThread](2023-09-04 19:00:33,999) INFO - qlib.Initialization - [init.py:76] - data_path={'DEFAULT_FREQ': WindowsPath('G:/qlibtutor/qlib_data/rq_cn_data')}
my_conf_path G:\qlibtutor\advance\benchmarks_dynamic\baseline\my_workflow_config_linear_Alpha158.yaml
[23644:MainThread](2023-09-04 19:00:34,009) INFO - qlib.Rolling - [base.py:164] - The prediction horizon is overrided
[23644:MainThread](2023-09-04 19:00:37,473) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[KeyError: 'Unknown memcache unit'].
File "g:/qlibtutor/advance/benchmarks_dynamic/baseline/my_rolling_benchmark.py", line 47, in
I didn't recently use those cs operators for a while. Below I would provide some hints for you if you wish to debug:
MemCache
in qlib/data/cache.py
.
def __getitem__(self, key):
if key == "c":
return self.__calendar_mem_cache
elif key == "i":
return self.__instrument_mem_cache
elif key == "f":
return self.__feature_mem_cache
elif key == "fs":
return self._feature_share_mem_cache
else:
raise KeyError(f"Unknown memcache unit {key}")
I'll suggest you to put a breakpoint there and see what key causes this error. Typically if you didn't mess up the code, it shouldn't ask for a non-existing cache type. Even the shared memory cache fs
is not properly initialised (in qlib/data/data.py
), it should return some NoneType not subscribe
error instead a key error. So I'm curious what key you'll see at the breakpoint.
您好,如上是我的代码,您看CSRank的调用写法是否正常。出错后,我打印key的值是fs. 然后,我发现新版qlib没有fs这个判断分支了,如下,新版MemCache与你的老版不太一样,不知该如何改动,才能使用您的CSRank?:
对了,我是用最新的ms的qlib,把你的CSRank拷贝过去,跑程序的,也许这样不合适。可能要直接用你的qlib跑才对,我后面试试
这次我安装了你的qlib,运行上述代码,出现如下错误,请问可以解决吗?:
你先确定下用的是我最新的main
分支,如果还是不行把你的配置发我下。你是用的官方的yahoo数据么?
fs
我这里一直都有,本来就是为了CrossSection加的,官方的一直没有。
是用了你的main。运行的代码如下:
import qlib
from qlib.data.dataset.loader import QlibDataLoader
if __name__ == "__main__":
qlib.init(provider_uri=r"G:\qlibrolling\qlib_data\cn_data", region="cn")
fields = ['CSRank($close)', 'Abs($close)'] #
names = ['CSRank', 'close'] #
labels = ['Ref($close, -2)/Ref($close, -1) - 1'] # label
label_names = ['LABEL']
data_loader_config = {
"feature": (fields, names),
"label": (labels, label_names)
}
data_loader = QlibDataLoader(config=data_loader_config)
df = data_loader.load(instruments='csi300', start_time='2017-01-01', end_time='2017-12-31')
print(df)
数据是qlib自带的,如果fields里去掉CSRank字段,则以上代码能正确输出
看了下,这个应该是系统的问题。我是在linux下开发的,linux里面python类变量能跨进程传送,但windows下面不行。
因为qlib原生的计算是按股票并行的,所以想算截面的话,需要加锁好同步各个股票的数据。我的设计是每个截面因子一个锁这样不同截面因子不会互相干扰。但这样需要把锁放在一个dict里面传到各个进程,这点Windows怎么也做不到,无论是函数传参还是作为类变量。这个还得继续研究看有啥方案。
你要是一定要试试Windows的话,可以把这个锁改成全局锁,即所有截面因子公用一个锁,但肯定会慢。你要改的话,主要需要看下这两个地方 1. https://github.com/qianyun210603/qlib/blob/741c3f78f6f42592ed3cd4a6feebfeb205a62d53/qlib/data/cache.py#L148-L158 2. https://github.com/qianyun210603/qlib/blob/741c3f78f6f42592ed3cd4a6feebfeb205a62d53/qlib/data/ops.py#L2044-L2089
把locks从RLock的dict改成一个RLock,然后去掉索引。
不过就算你改了之后也会报个新错误,说缺少SH600074的key,这个纯是因为原始数据里面这个股票数据就是缺失的。
这样啊。谢谢您。看样子qlib截面因子是有问题。
@qianyun210603 您好,我在qlib中添加CSRank遇到了相同的问题: KeyError: 'Unknown memcache unit', 然后我添加了
现在的报错如下,我尝试了不带CSRank的第101号因子,能跑通,但是带CSRank的则会报错,我的环境是Linux
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
Traceback (most recent call last):
File "/home/hyx/code/qlib/qlib/data/data.py", line 1186, in features
return DatasetD.dataset(
TypeError: dataset() got multiple values for argument 'inst_processors'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/_utils.py", line 72, in __call__
return self.func(**kwargs)
File "/home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py", line 598, in __call__
return [func(*args, **kwargs)
File "/home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py", line 598, in <listcomp>
return [func(*args, **kwargs)
File "/home/hyx/code/qlib/qlib/data/data.py", line 615, in inst_calculator
obj[field] = ExpressionD.expression(inst, field, start_time, end_time, freq)
File "/home/hyx/code/qlib/qlib/data/data.py", line 859, in expression
series = expression.load(instrument, query_start, query_end, freq)
File "/home/hyx/code/qlib/qlib/data/base.py", line 193, in load
series = self._load_internal(instrument, start_index, end_index, *args)
File "/home/hyx/code/qlib/qlib/data/ops.py", line 306, in _load_internal
series_left = self.feature_left.load(instrument, start_index, end_index, *args)
File "/home/hyx/code/qlib/qlib/data/base.py", line 193, in load
series = self._load_internal(instrument, start_index, end_index, *args)
File "/home/hyx/code/qlib/qlib/data/ops.py", line 1542, in _load_internal
if cache_key not in H["fs"]:
TypeError: argument of type 'NoneType' is not iterable
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
Cell In[1], line 215
210 data_loader_config = {
211 "feature": (fields, names),
212 "label": (labels, label_names)
213 }
214 data_loader = QlibDataLoader(config=data_loader_config)
--> 215 df_feature = data_loader.load(instruments=market, start_time=start_time, end_time=end_time)
218 # 处理器配置
219 _DEFAULT_LEARN_PROCESSORS_riskfree = [
220 {"class": "CSZScoreNorm", "kwargs": {"fields_group": "feature"}},
221 {"class": "CSZScoreNorm", "kwargs": {"fields_group": "label"}},
(...)
224 {"class": "DropnaProcessor", "kwargs": {"fields_group": "feature"}},
225 ]
File ~/code/qlib/qlib/data/dataset/loader.py:141, in DLWParser.load(self, instruments, start_time, end_time)
138 def load(self, instruments=None, start_time=None, end_time=None) -> pd.DataFrame:
139 if self.is_group:
140 df = pd.concat(
--> 141 {
142 grp: self.load_group_df(instruments, exprs, names, start_time, end_time, grp)
143 for grp, (exprs, names) in self.fields.items()
144 },
145 axis=1,
146 )
147 else:
148 exprs, names = self.fields
File ~/code/qlib/qlib/data/dataset/loader.py:142, in <dictcomp>(.0)
138 def load(self, instruments=None, start_time=None, end_time=None) -> pd.DataFrame:
139 if self.is_group:
140 df = pd.concat(
141 {
--> 142 grp: self.load_group_df(instruments, exprs, names, start_time, end_time, grp)
143 for grp, (exprs, names) in self.fields.items()
144 },
145 axis=1,
146 )
147 else:
148 exprs, names = self.fields
File ~/code/qlib/qlib/data/dataset/loader.py:223, in QlibDataLoader.load_group_df(self, instruments, exprs, names, start_time, end_time, gp_name)
219 freq = self.freq[gp_name] if isinstance(self.freq, dict) else self.freq
220 inst_processors = (
221 self.inst_processors if isinstance(self.inst_processors, list) else self.inst_processors.get(gp_name, [])
222 )
--> 223 df = D.features(instruments, exprs, start_time, end_time, freq=freq, inst_processors=inst_processors)
224 df.columns = names
225 if self.swap_level:
File ~/code/qlib/qlib/data/data.py:1190, in BaseProvider.features(self, instruments, fields, start_time, end_time, freq, disk_cache, inst_processors)
1186 return DatasetD.dataset(
1187 instruments, fields, start_time, end_time, freq, disk_cache, inst_processors=inst_processors
1188 )
1189 except TypeError:
-> 1190 return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)
File ~/code/qlib/qlib/data/data.py:923, in LocalDatasetProvider.dataset(self, instruments, fields, start_time, end_time, freq, inst_processors)
921 start_time = cal[0]
922 end_time = cal[-1]
--> 923 data = self.dataset_processor(
924 instruments_d, column_names, start_time, end_time, freq, inst_processors=inst_processors
925 )
927 return data
File ~/code/qlib/qlib/data/data.py:577, in DatasetProvider.dataset_processor(instruments_d, column_names, start_time, end_time, freq, inst_processors)
567 inst_l.append(inst)
568 task_l.append(
569 delayed(DatasetProvider.inst_calculator)(
570 inst, start_time, end_time, freq, normalize_column_names, spans, C, inst_processors
571 )
572 )
574 data = dict(
575 zip(
576 inst_l,
--> 577 ParallelExt(n_jobs=workers, backend=C.joblib_backend, maxtasksperchild=C.maxtasksperchild)(task_l),
578 )
579 )
581 new_data = dict()
582 for inst in sorted(data.keys()):
File /home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py:2007, in Parallel.__call__(self, iterable)
2001 # The first item from the output is blank, but it makes the interpreter
2002 # progress until it enters the Try/Except block of the generator and
2003 # reaches the first `yield` statement. This starts the asynchronous
2004 # dispatch of the tasks to the workers.
2005 next(output)
-> 2007 return output if self.return_generator else list(output)
File /home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py:1650, in Parallel._get_outputs(self, iterator, pre_dispatch)
1647 yield
1649 with self._backend.retrieval_context():
-> 1650 yield from self._retrieve()
1652 except GeneratorExit:
1653 # The generator has been garbage collected before being fully
1654 # consumed. This aborts the remaining tasks if possible and warn
1655 # the user if necessary.
1656 self._exception = True
File /home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py:1754, in Parallel._retrieve(self)
1747 while self._wait_retrieval():
1748
1749 # If the callback thread of a worker has signaled that its task
1750 # triggered an exception, or if the retrieval loop has raised an
1751 # exception (e.g. `GeneratorExit`), exit the loop and surface the
1752 # worker traceback.
1753 if self._aborting:
-> 1754 self._raise_error_fast()
1755 break
1757 # If the next job is not ready for retrieval yet, we just wait for
1758 # async callbacks to progress.
File /home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py:1789, in Parallel._raise_error_fast(self)
1785 # If this error job exists, immediately raise the error by
1786 # calling get_result. This job might not exists if abort has been
1787 # called directly or if the generator is gc'ed.
1788 if error_job is not None:
-> 1789 error_job.get_result(self.timeout)
File /home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py:745, in BatchCompletionCallBack.get_result(self, timeout)
739 backend = self.parallel._backend
741 if backend.supports_retrieve_callback:
742 # We assume that the result has already been retrieved by the
743 # callback thread, and is stored internally. It's just waiting to
744 # be returned.
--> 745 return self._return_or_raise()
747 # For other backends, the main thread needs to run the retrieval step.
748 try:
File /home/hyx/bash/envs/qlib3/lib/python3.8/site-packages/joblib/parallel.py:763, in BatchCompletionCallBack._return_or_raise(self)
761 try:
762 if self.status == TASK_ERROR:
--> 763 raise self._result
764 return self._result
765 finally:
TypeError: argument of type 'NoneType' is not iterable
我的代码如下:
import qlib
import pandas as pd
import numpy as np
from qlib.constant import REG_US
from qlib.utils import exists_qlib_data, init_instance_by_config
from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord,SigAnaRecord
from qlib.utils import flatten_dict
import pylab as pl
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from qlib.data.dataset.handler import DataHandlerLP
from qlib.data.dataset.loader import QlibDataLoader
provider_uri = "/home/hyx/qlib_data/us/"
market = "all"
start_time = '2012-01-01'
end_time = '2022-12-31'
qlib.init(provider_uri=provider_uri, region=REG_US)
f_return = "($close/Ref($close, 1)-1)"
f_adv5 = "Mean($money, 5)"
f_adv10 = "Mean($money, 10)"
f_adv15 = "Mean($money, 15)"
f_adv20 = "Mean($money, 20)"
f_adv30 = "Mean($money, 30)"
f_adv40 = "Mean($money, 40)"
f_adv50 = "Mean($money, 50)"
f_adv60 = "Mean($money, 60)"
f_adv120 = "Mean($money, 120)"
f_adv180 = "Mean($money, 180)"
alpha_components = {
"alpha001": f"CSRank(IdxMax(Power(If({f_return}<0, Std({f_return}, 20), $close), 2), 5))-0.5",
}
figurefilepath = '/home/hyx/code/qlib/output/FormulaAlpha/'
sharpe_values = {}
alpha_name = 'alpha004'
fields = [alpha_components[alpha_name]] # MACD
names = [alpha_name]
labels = ['Ref($close, -11)/Ref($close, -1) - 1'] # label
label_names = ['LABEL']
data_loader_config = {
"feature": (fields, names),
"label": (labels, label_names)
}
data_loader = QlibDataLoader(config=data_loader_config)
df_feature = data_loader.load(instruments=market, start_time=start_time, end_time=end_time)
@timerobin 首先说清楚你在什么基础上改的,是官方的Qlib还是我这个的main分支。 我现在不大搞因子了。印象里当时改支持截面的时候改的地方不止一两个文件,还有配置文件也要改。
@qianyun210603 您好,我在官方的Qlib的基础上改的,在qlib.data.ops.py加入了CSRank,CSScale,XSectionOperator,请问配置文件指哪些呀
Hello, could you point out where your Cross Sectional Factor located in?