microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.14k stars 2.59k forks source link

pd.IndexSlice[idx_slc]出错时,会提示清理redis的缓存 #235

Open pan-cai opened 3 years ago

pan-cai commented 3 years ago

qlib\data\dataset\utils.py

fetch_df_by_index

return df.loc[ pd.IndexSlice[idx_slc], ]

win10系统, fetch_df_by_index函数中的pd.IndexSlice[idx_slc]出错时, 会提示清理redis的数据;

错误应该和redis没有关系, 此处可能需要兼容一下此处异常。

zhupr commented 3 years ago

@pan-cai Hi, How did you trigger this error, or what actions did you do?

pan-cai commented 3 years ago

使用qlib下载的数据没问题,使用自己csv数据转换成qlib格式的数据后会出现

=========================================================================== csv数据是使用jqdatasdk下载2005-01-01至2021-01-29 日级别的csv格式数据;然后转换成qlib格式的csv数据;然后再将qlib格式的csv数据转换为qlib格式的数据(.bin结尾的)

=========================================================================== qrun.exe 运行类似 Alpha158.yaml 会触发

报错提示是 fetch_df_by_index 函数的,查看了出错时的参数 slc 为 slice('2017-01-01', '2020-08-01', None) idx_slc 为 (slice('2017-01-01', '2020-08-01', None), slice(None, None, None)) 报错 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'

=========================================================================== 还有下面的代码也会触发

instruments = ['SH600000']
# fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
fields = ['$close', '$volume', ]
features = D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day')

下面是具体的报错

File "xxx/test_data.py", line 39, in <module>
    features = D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head()
  File "xxx\lib\site-packages\qlib\data\data.py", line 975, in features
    return DatasetD.dataset(instruments, fields, start_time, end_time, freq, disk_cache)
  File "xxx\lib\site-packages\qlib\data\cache.py", line 313, in dataset
    return self._dataset(instruments, fields, start_time, end_time, freq, disk_cache)
  File "xxx\lib\site-packages\qlib\data\cache.py", line 639, in _dataset
    with CacheUtils.writer_lock(self.r, "dataset-%s" % _cache_uri):
  File "xxxx\anaconda\lib\contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "xxx\lib\site-packages\qlib\data\cache.py", line 220, in writer_lock
    CacheUtils.acquire(current_cache_wlock, lock_name)
  File "xxx\lib\site-packages\qlib\data\cache.py", line 184, in acquire
    """
QlibCacheException: It sees the key(lock:xxxxx\\qlib\\qlib_data\\cn_data:dataset-cfebe46424b8064e04218c84ce30751b-wlock) of the redis lock has existed in your redis db now.
                    You can use the following command to clear your redis keys and rerun your commands:
                    $ redis-cli
                    > select 1
                    > del "lock:xxxxx\\qlib\\qlib_data\\cn_data:dataset-cfebe46424b8064e04218c84ce30751b-wlock"
                    > quit
                    If the issue is not resolved, use "keys *" to find if multiple keys exist. If so, try using "flushall" to clear all the keys.
hanson-young commented 3 years ago

I had the same errors

win10 qlib.version Out[20]: '0.6.2'

I used those command to solve it, bug the problem remains! $ redis-cli

select 1 del "lock:xxxxx\qlib\qlib_data\cn_data:dataset-cfebe46424b8064e04218c84ce30751b-wlock" quit

you-n-g commented 3 years ago

@hanson-young Could you give more details about your error?

hanson-young commented 3 years ago

@you-n-g

Windows AMD64 Windows-10-10.0.18362-SP0 10.0.18362

Python version: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]

Qlib version: 0.6.3.99 numpy==1.20.3 pandas==1.2.4 scipy==1.5.2 requests==2.24.0 sacred==0.8.2 python-socketio==3.1.2 redis==3.5.3 python-redis-lock==3.7.0 schedule==1.1.0 cvxpy==1.0.21 hyperopt==0.1.1 fire==0.4.0 statsmodels==0.12.0 xlrd==1.2.0 plotly==4.12.0 matplotlib==3.1.3 tables==3.6.1 pyyaml==5.3.1 mlflow==1.17.0 tqdm==4.50.2 loguru==0.5.3 lightgbm==3.2.1 tornado==6.0.4 joblib==0.17.0 fire==0.4.0 ruamel.yaml==0.17.9

[62152:MainThread](2021-06-12 23:54:23,909) INFO - qlib.Initialization - [init.py:48] - qlib successfully initialized based on client settings. [62152:MainThread](2021-06-12 23:54:23,910) INFO - qlib.Initialization - [init.py:49] - data_path=D:\Finance\digital_currency_api\qlib\qlib_data\cn_data [62152:MainThread](2021-06-12 23:54:23,912) ERROR - qlib.workflow - [utils.py:34] - An exception has been raised[QlibCacheException: It sees the key(lock:D:\Finance\digital_currency_api\qlib\qlib_data\cn_data:dataset-2502fd27db971bf723c6523d9e748f4d-wlock) of the redis lock has existed in your redis db now. You can use the following command to clear your redis keys and rerun your commands: $ redis-cli

select 1 del "lock:D:\Finance\digital_currency_api\qlib\qlib_data\cn_data:dataset-2502fd27db971bf723c6523d9e748f4d-wlock" quit If the issue is not resolved, use "keys *" to find if multiple keys exist. If so, try using "flushall" to clear all the keys. ]. File "D:/Finance/digital_currency_api/vnpy-master/hanson/test_qt_main.py", line 27, in df = D.features(instruments, fields, start_time='2010-01-01', end_time='2013-01-31', freq='day').head() File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\data.py", line 981, in features return DatasetD.dataset(instruments, fields, start_time, end_time, freq, disk_cache) File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\cache.py", line 376, in dataset return self._dataset(instruments, fields, start_time, end_time, freq, disk_cache) File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\cache.py", line 702, in _dataset with CacheUtils.writer_lock(self.r, "dataset-%s" % _cache_uri): File "D:\Program\Anaconda\lib\contextlib.py", line 113, in enter return next(self.gen) File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\cache.py", line 283, in writer_lock CacheUtils.acquire(current_cache_wlock, lock_name) File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\cache.py", line 239, in acquire raise QlibCacheException(

you-n-g commented 3 years ago

Hi, @zhupr , can you give any advice on this issue? Thanks

zhupr commented 3 years ago

@hanson-young Hi,you can confirm the location of the error code by following these steps:

  1. qlib.init adds the expression_cache and dataset_cache parameters: qlib.init(xxx, expression_cache=None, dataset_cache=None).
    • If you still encounter an error, provide more details about your error.
    • If no errors are encountered, continue with step 2
  2. qlib.init removes expression_cache=None and dataset_cache=None, and then changes the code in the acquire function in qlib/data/cache.py to:
    @staticmethod
    def acquire(lock, lock_name):
        lock.acquire()
    • If you encounter an error, provide more details about your error
hanson-young commented 3 years ago

@zhupr In the step 1, qlib.init(provider_uri="D:\Finance\digital_currency_api\qlib\qlib_data\cn_data", region=REG_CN,expression_cache=None,dataset_cache=None) [59156:MainThread](2021-06-14 16:50:39,726) INFO - qlib.Initialization - [config.py:284] - default_conf: client. [59156:MainThread](2021-06-14 16:50:41,377) INFO - qlib.Initialization - [init.py:48] - qlib successfully initialized based on client settings. [59156:MainThread](2021-06-14 16:50:41,378) INFO - qlib.Initialization - [init.py:49] - data_path=D:\Finance\digital_currency_api\qlib\qlib_data\cn_data [22540:MainThread](2021-06-14 16:50:42,584) INFO - qlib.Initialization - [config.py:284] - default_conf: client. [22540:MainThread](2021-06-14 16:50:44,193) INFO - qlib.Initialization - [init.py:48] - qlib successfully initialized based on client settings. [22540:MainThread](2021-06-14 16:50:44,193) INFO - qlib.Initialization - [init.py:49] - data_path=D:\Finance\digital_currency_api\qlib\qlib_data\cn_data [22540:MainThread](2021-06-14 16:50:44,246) ERROR - qlib.workflow - [utils.py:34] - An exception has been raised[RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.].

File "", line 1, in File "D:\Program\Anaconda\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "D:\Program\Anaconda\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "D:\Program\Anaconda\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\Program\Anaconda\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "D:\Program\Anaconda\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "D:\Program\Anaconda\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "D:\Program\Anaconda\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Finance\digital_currency_api\vnpy-master\hanson\test_qt_main.py", line 29, in df = D.features(instruments, fields, start_time='2010-01-01', end_time='2013-01-31', freq='day').head() File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\data.py", line 983, in features return DatasetD.dataset(instruments, fields, start_time, end_time, freq) File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\data.py", line 715, in dataset data = self.dataset_processor(instruments_d, column_names, start_time, end_time, freq) File "D:\Program\Anaconda\lib\site-packages\pyqlib-0.6.3.99-py3.8-win-amd64.egg\qlib\data\data.py", line 447, in dataset_processor p = Pool(processes=workers) File "D:\Program\Anaconda\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "D:\Program\Anaconda\lib\multiprocessing\pool.py", line 212, in init self._repopulate_pool() File "D:\Program\Anaconda\lib\multiprocessing\pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "D:\Program\Anaconda\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static w.start() File "D:\Program\Anaconda\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "D:\Program\Anaconda\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "D:\Program\Anaconda\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "D:\Program\Anaconda\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "D:\Program\Anaconda\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
zhupr commented 3 years ago

On Windows, the code to get the features from qlib should be put under if __name__ == "__main__":

import qlib
from qlib.data import D

if __name__ == "__main__":
    qlib.init(provider_uri="D:\Finance\digital_currency_api\qlib\qlib_data\cn_data")
    df = D.features(D.instruments("all"), ["$close", "$close/Ref($close, 1)-1"])
hanson-young commented 3 years ago

I solved the above problem,thanks.@zhupr

you-n-g commented 3 years ago

@hanson-young

Would you mind sending a PR to improve the docs about this Question and become one of the contributors of Qlib? Other users who have similar confusion will be grateful to you :) Thanks

hanson-young commented 3 years ago

@you-n-g OK, I'd love to contribute pr

you-n-g commented 3 years ago

e.g. You can improve the docs here https://qlib.readthedocs.io/en/latest/FAQ/FAQ.html

It is also OK if you could find any better place.

bbbzhai commented 2 years ago

On Windows, the code to get the features from qlib should be put under if __name__ == "__main__":

import qlib
from qlib.data import D

if __name__ == "__main__":
    qlib.init(provider_uri="D:\Finance\digital_currency_api\qlib\qlib_data\cn_data")
    df = D.features(D.instruments("all"), ["$close", "$close/Ref($close, 1)-1"])

On Mac as well, D.features() need to be used within a function. I try

instruments = ['AAPL']
    fields = ['$close', '$volume', '$open', '$close', '$high', '$low']
    historical_df = D.features(instruments, fields, start_time='2020-01-01', end_time='2022-01-05', freq='day')

it works.

but if I only to get two instruments at the same time:

instruments = ['AAPL', 'ATAI']
    fields = ['$close', '$volume', '$open', '$close', '$high', '$low']
    historical_df = D.features(instruments, fields, start_time='2020-01-01', end_time='2022-01-05', freq='day')

This would fail and trigger QlibCacheException

But if I put them within main, then it's fine.

Don't know if this is a bug.

Wangwuyi123 commented 2 years ago

@bbbzhai Python has a flaw. When using multi-process, you need to enter from main. When multiple stocks are passed in, multi-process will be called by default.