Closed liujianliuku closed 1 year ago
Hi liujianliuku, I meet the same issue, but it disappeared after I run the same script again
I assume there is some thread unsafe issue in workflow_by_code.py. I will also post my log msg here.
2022-03-20 11:57:23.194 | WARNING | qlib.tests.data:qlib_data:150 - Data already exists: ~/.qlib/qlib_data/cn_data, the data download will be skipped
If downloading is required: `exists_skip=False` or `change target_dir`
[10671:MainThread](2022-03-20 11:57:23,195) INFO - qlib.Initialization - [config.py:402] - default_conf: client.
[10671:MainThread](2022-03-20 11:57:23,201) INFO - qlib.Initialization - [__init__.py:73] - qlib successfully initialized based on client settings.
[10671:MainThread](2022-03-20 11:57:23,202) INFO - qlib.Initialization - [__init__.py:75] - data_path={'__DEFAULT_FREQ': PosixPath('/home/q/.qlib/qlib_data/cn_data')}
[10671:MainThread](2022-03-20 11:58:25,445) INFO - qlib.timer - [log.py:113] - Time cost: 59.355s | Loading data Done
[10671:MainThread](2022-03-20 11:58:33,478) INFO - qlib.timer - [log.py:113] - Time cost: 0.272s | DropnaLabel Done
/home/q/FundFusion/env/lib/python3.8/site-packages/pandas/core/frame.py:3641: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
[10671:MainThread](2022-03-20 11:58:38,979) INFO - qlib.timer - [log.py:113] - Time cost: 5.500s | CSZScoreNorm Done
[10671:MainThread](2022-03-20 11:58:38,981) INFO - qlib.timer - [log.py:113] - Time cost: 13.535s | fit & process data Done
[10671:MainThread](2022-03-20 11:58:38,981) INFO - qlib.timer - [log.py:113] - Time cost: 72.892s | Init data Done
KMID KLEN KMID2 KUP KUP2 KLOW KLOW2 KSFT KSFT2 ... VSUMN20 VSUMN30 VSUMN60 VSUMD5 VSUMD10 VSUMD20 VSUMD30 VSUMD60 LABEL0
datetime instrument ...
2008-01-02 SH600000 0.010377 0.061132 0.169754 0.028302 0.462962 0.022453 0.367285 0.004528 0.074077 ... 0.490807 0.485962 0.510337 -0.022349 -0.101278 0.018386 0.028076 -0.020675 0.042068
SH600004 0.057279 0.059666 0.960000 0.002387 0.040000 0.000000 0.000000 0.054893 0.920001 ... 0.392360 0.423984 0.514932 0.633267 0.342211 0.215280 0.152031 -0.029863 0.000000
SH600006 0.012673 0.040323 0.314285 0.008065 0.200000 0.019585 0.485715 0.024194 0.600000 ... 0.433405 0.494553 0.524930 -0.064064 -0.045677 0.133190 0.010894 -0.049861 -0.008830
SH600007 0.066977 0.084186 0.795579 0.007907 0.093923 0.009302 0.110498 0.068372 0.812153 ... 0.291541 0.352613 0.420448 0.878055 0.514779 0.416918 0.294774 0.159104 0.007496
SH600008 0.051163 0.082326 0.621469 0.027907 0.338984 0.003256 0.039547 0.026512 0.322033 ... 0.381852 0.400486 0.469882 0.482783 0.148572 0.236296 0.199027 0.060235 -0.015445
[5 rows x 159 columns]
[10671:MainThread](2022-03-20 11:58:45,679) INFO - qlib.workflow - [expm.py:318] - <mlflow.tracking.client.MlflowClient object at 0x7f9db78547f0>
[10671:MainThread](2022-03-20 11:58:45,708) WARNING - qlib.workflow - [expm.py:198] - No valid experiment found. Create a new experiment with name workflow.
[10671:MainThread](2022-03-20 11:58:45,712) INFO - qlib.workflow - [exp.py:257] - Experiment 1 starts running ...
[10671:MainThread](2022-03-20 11:58:46,068) INFO - qlib.workflow - [recorder.py:293] - Recorder 7efca793ef574b4ca1c0b25218f090d7 starts running under Experiment 1 ...
Training until validation scores don't improve for 50 rounds
[20] train's l2: 0.99032 valid's l2: 0.99407
[40] train's l2: 0.98654 valid's l2: 0.993434
[60] train's l2: 0.983952 valid's l2: 0.993192
[80] train's l2: 0.981825 valid's l2: 0.993024
[100] train's l2: 0.97992 valid's l2: 0.992874
[120] train's l2: 0.978043 valid's l2: 0.992838
[140] train's l2: 0.976317 valid's l2: 0.992838
[160] train's l2: 0.974702 valid's l2: 0.992849
[180] train's l2: 0.97313 valid's l2: 0.992885
[200] train's l2: 0.971546 valid's l2: 0.992959
Early stopping, best iteration is:
[168] train's l2: 0.974048 valid's l2: 0.992791
[10671:MainThread](2022-03-20 11:59:55,049) INFO - qlib.workflow - [record_temp.py:194] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 1
'The following are prediction results of the LGBModel model.'
score
datetime instrument
2017-01-03 SH600000 -0.036180
SH600008 0.012532
SH600009 0.036990
SH600010 -0.020218
SH600015 -0.129913
{'IC': 0.05148803800355824,
'ICIR': 0.42031572365231823,
'Rank IC': 0.05356542355384916,
'Rank ICIR': 0.4463195692082111}
[10671:MainThread](2022-03-20 11:59:56,221) INFO - qlib.backtest caller - [__init__.py:82] - Create new exchange
[10671:MainThread](2022-03-20 12:00:07,081) WARNING - qlib.BaseExecutor - [executor.py:111] - `common_infra` is not set for <qlib.backtest.executor.SimulatorExecutor object at 0x7f9e4dcb6d90>
backtest loop: 0%| | 0/871 [00:00<?, ?it/s]/home/q/FundFusion/env/lib/python3.8/site-packages/qlib/utils/index_data.py:480: RuntimeWarning: Mean of empty slice
return np.nanmean(self.data)
backtest loop: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 871/871 [00:17<00:00, 49.72it/s]
[10671:MainThread](2022-03-20 12:00:25,565) INFO - qlib.workflow - [record_temp.py:499] - Portfolio analysis record 'port_analysis_1day.pkl' has been saved as the artifact of the Experiment 1
'The following are analysis results of benchmark return(1day).'
risk
mean 0.000477
std 0.012295
annualized_return 0.113561
information_ratio 0.598699
max_drawdown -0.370479
'The following are analysis results of the excess return without cost(1day).'
risk
mean 0.000687
std 0.005483
annualized_return 0.163574
information_ratio 1.933691
max_drawdown -0.097084
'The following are analysis results of the excess return with cost(1day).'
risk
mean 0.000492
std 0.005482
annualized_return 0.117084
information_ratio 1.384504
max_drawdown -0.105894
[10671:MainThread](2022-03-20 12:00:25,584) INFO - qlib.workflow - [record_temp.py:524] - Indicator analysis record 'indicator_analysis_1day.pkl' has been saved as the artifact of the Experiment 1
'The following are analysis results of indicators(1day).'
value
ffr 1.0
pa 0.0
pos 0.0
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/qlib/utils/paral.py", line 88, in run
data()
File "/home/q/FundFusion/env/lib/python3.8/site-packages/qlib/workflow/recorder.py", line 373, in log_metrics
self.client.log_metric(self.id, name, data, step=step)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/mlflow/tracking/client.py", line 690, in log_metric
self._tracking_client.log_metric(run_id, key, value, timestamp, step)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 233, in log_metric
self.store.log_metric(run_id, metric)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/mlflow/store/tracking/file_store.py", line 776, in log_metric
run_info = self._get_run_info(run_id)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/mlflow/store/tracking/file_store.py", line 554, in _get_run_info
run_info = self._get_run_info_from_dir(run_dir)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/mlflow/store/tracking/file_store.py", line 563, in _get_run_info_from_dir
run_info = _read_persisted_run_info_dict(meta)
File "/home/q/FundFusion/env/lib/python3.8/site-packages/mlflow/store/tracking/file_store.py", line 104, in _read_persisted_run_info_dict
dict_copy = run_info_dict.copy()
AttributeError: 'NoneType' object has no attribute 'copy'
[10671:MainThread](2022-03-20 12:00:25,620) INFO - qlib.timer - [log.py:113] - Time cost: 0.005s | waiting `async_log` Done
And
Linux
x86_64
Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.29
#1 SMP Wed Oct 28 23:40:43 UTC 2020
Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
Qlib version: 0.8.4.99
numpy==1.21.4
pandas==1.3.4
scipy==1.7.2
requests==2.27.1
sacred==0.8.2
python-socketio==5.5.2
redis==4.1.4
python-redis-lock==3.7.0
schedule==1.1.0
cvxpy==1.2.0
hyperopt==0.1.2
fire==0.4.0
statsmodels==0.13.2
xlrd==2.0.1
plotly==5.6.0
matplotlib==3.5.0
tables==3.7.0
pyyaml==6.0
mlflow==1.24.0
tqdm==4.62.3
loguru==0.6.0
lightgbm==3.3.2
tornado==6.1
joblib==1.1.0
fire==0.4.0
ruamel.yaml==0.17.21
Right, I can replay these issues when I run two processes of task_manager_rolling simultaneously, after hours running, it will fail like this.
I'm trying to replay this issue and I find that some times it does occur and some times it doesn't. Is there a stable way to replay this issue?
I meet the same issue. According to the exception, I think it is related to async call by MLflow, which causes thread unsafe issue. But I don't know how to deal with this issue.
This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days
same issue,by debug, when this error,meta.yaml
is empty,that cause this error.
but after continue,meta.yaml will has value
This error keeps poping up and is quite annoying.
This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days
Hi, I tried to run examples/model_rolling/task_manager_rolling.py and encountered following issues, which is same with the link above.
[4490:MainThread](2022-03-16 23:41:19,948) INFO - qlib.timer - [log.py:113] - Time cost: 2.219s | CSZScoreNorm Done [4490:MainThread](2022-03-16 23:41:19,948) INFO - qlib.timer - [log.py:113] - Time cost: 2.359s | fit & process data Done [4490:MainThread](2022-03-16 23:41:19,948) INFO - qlib.timer - [log.py:113] - Time cost: 6.673s | Init data Done [0] train-rmse:1.05221 valid-rmse:1.05333 [20] train-rmse:0.96462 valid-rmse:0.99454 [40] train-rmse:0.94751 valid-rmse:0.99711 [60] train-rmse:0.93133 valid-rmse:0.99901 [61] train-rmse:0.93027 valid-rmse:0.99892 [4490:MainThread](2022-03-16 23:41:37,447) INFO - qlib.workflow - [record_temp.py:194] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 1 'The following are prediction results of the XGBModel model.' score datetime instrument
2019-04-09 SH600000 0.031448 SH600009 -0.114521 SH600010 0.133871 SH600011 0.022843 SH600015 -0.006089 {'IC': 0.019224824145083887, 'ICIR': 0.12174991104957546, 'Rank IC': 0.027944959160536458, 'Rank ICIR': 0.19297428591470864} Exception in thread Thread-8: Traceback (most recent call last): File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/qlib/utils/paral.py", line 88, in run data() File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/qlib/workflow/recorder.py", line 371, in log_metrics self.client.log_metric(self.id, name, data, step=step) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/mlflow/tracking/client.py", line 690, in log_metric self._tracking_client.log_metric(run_id, key, value, timestamp, step) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 233, in log_metric self.store.log_metric(run_id, metric) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 776, in log_metric run_info = self._get_run_info(run_id) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 554, in _get_run_info run_info = self._get_run_info_from_dir(run_dir) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 563, in _get_run_info_from_dir run_info = _read_persisted_run_info_dict(meta) File "/home/liujia/anaconda3/envs/qlib/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 104, in _read_persisted_run_info_dict dict_copy = run_info_dict.copy() AttributeError: 'NoneType' object has no attribute 'copy'
My collector info:
Linux x86_64 Linux-5.13.0-35-generic-x86_64-with-debian-bullseye-sid
40~20.04.1-Ubuntu SMP Mon Mar 7 09:18:32 UTC 2022
Python version: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
Qlib version: 0.8.4.99 numpy==1.21.5 pandas==1.3.5 scipy==1.7.3 requests==2.27.1 sacred==0.8.2 python-socketio==5.5.2 redis==4.1.4 python-redis-lock==3.7.0 schedule==1.1.0 cvxpy==1.2.0 hyperopt==0.1.2 fire==0.4.0 statsmodels==0.13.2 xlrd==2.0.1 plotly==5.6.0 matplotlib==3.5.1 tables==3.7.0 pyyaml==6.0 mlflow==1.24.0 tqdm==4.63.0 loguru==0.6.0 lightgbm==3.3.2 tornado==6.1 joblib==1.1.0 fire==0.4.0 ruamel.yaml==0.17.21