microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.44k stars 2.63k forks source link

Normalize with the Error:ValueError: need at most 63 handles, got a sequence of length 72 #1446

Open louyuenan opened 1 year ago

louyuenan commented 1 year ago

🐛 Bug Description

When I was tring to normalize the 1min data, use following code, (env38) C:\Users\Anani>python scripts/data_collector/yahoo/collector.py normalize_data --qlib_data_1d_dir ~/.qlib/qlib_data/cn_data --source_dir ~/.qlib/stock_data/source/cn_data_1min --normalize_dir ~/.qlib/stock_data/source/cn_1min_nor --region CN --interval 1min --max_workers 8

I got this ERROR code, 2023-02-22 21:54:27.234 | INFO | data_collector.utils:get_calendar_list:106 - end of get calendar list: ALL. [6924:MainThread](2023-02-22 21:55:01,958) INFO - qlib.Initialization - [config.py:416] - default_conf: client. [6924:MainThread](2023-02-22 21:55:02,858) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings. [6924:MainThread](2023-02-22 21:55:02,859) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': WindowsPath('C:/Users/Anani/.qlib/qlib_data/cn_data')} Exception in thread Thread-1: Traceback (most recent call last): File "C:\Users\Anani\anaconda3\envs\env38\lib\threading.py", line 932, in _bootstrap_inner self.run() File "C:\Users\Anani\anaconda3\envs\env38\lib\threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "C:\Users\Anani\anaconda3\envs\env38\lib\multiprocessing\pool.py", line 519, in _handle_workers cls._wait_for_updates(current_sentinels, change_notifier) File "C:\Users\Anani\anaconda3\envs\env38\lib\multiprocessing\pool.py", line 499, in _wait_for_updates wait(sentinels, timeout=timeout) File "C:\Users\Anani\anaconda3\envs\env38\lib\multiprocessing\connection.py", line 879, in wait ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout) File "C:\Users\Anani\anaconda3\envs\env38\lib\multiprocessing\connection.py", line 811, in _exhaustive_wait res = _winapi.WaitForMultipleObjects(L, False, timeout) ValueError: need at most 63 handles, got a sequence of length 72

Environment

Additional Notes

I have learned that problem may be caused by the CPU cores >60, Is it true? if so, how do I limited the cpu cores when do this project? SIncerely,

chuhuan88 commented 1 year ago

Hello, can you share the downloaded dataset with me, I can't get on Yahoo, thank you very much.

specialuse commented 1 year ago

I meet the same problem when I run something like "init_instance_by_config(dataset_config)"

louyuenan commented 8 months ago

Of course this problem is caused by old version multiprocess library can not control more than 60 logic cores, I "fixed" this problem by disabled intel hyper thread, but now initialisation is super slow, hope one day fix it or add gpu init support.