microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
15.49k stars 2.64k forks source link

exists_skip and delete_old ignored #1121

Open magick93 opened 2 years ago

magick93 commented 2 years ago

🐛 Bug Description

When running the following (note the last 2 args) command

python data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir ~/.qlib/qlib_data/us_data --region us --interval 1d --version v2 --trading_date 2022-01-01 --end_date 2022-12-31 --exists_skip true --delete_old false

I get the following terminal output and prompt:

python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir ~/.qlib/qlib_data/us_data --region us --interval 1d --version v2 --trading_date 2022-01-01 --end_date 2022-12-31 --exists_skip true --delete_old false
2022-06-09 23:17:19.757 | WARNING  | qlib.tests.data:_download_data:56 - The data for the example is collected from Yahoo Finance. Please be aware that the quality of the data might not be perfect. (You can refer to the original data source: https://finance.yahoo.com/lookup.)
2022-06-09 23:17:19.757 | INFO     | qlib.tests.data:_download_data:59 - qlib_data_us_1d_latest.zip downloading......
450095104it [02:32, 2956744.87it/s]                                                                             
2022-06-09 23:19:51.988 | WARNING  | qlib.tests.data:_unzip:81 - will delete the old qlib data directory(features, instruments, calendars, features_cache, dataset_cache): /root/.qlib/qlib_data/us_data
Will be deleted: 
        ['/root/.qlib/qlib_data/us_data/features', '/root/.qlib/qlib_data/us_data/calendars', '/root/.qlib/qlib_data/us_data/instruments']
If you do not need to delete /root/.qlib/qlib_data/us_data, please change the <--target_dir>
Are you sure you want to delete, yes(Y/y), no (N/n):

To Reproduce

Steps to reproduce the behavior:

  1. Run the above stated command

Expected Behavior

Screenshot

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information and paste them here directly.

python scripts/collect_info.py all
Linux
x86_64
Linux-5.15.0-35-generic-x86_64-with-glibc2.2.5
#36-Ubuntu SMP Sat May 21 02:24:07 UTC 2022

Python version: 3.8.0 (default, Nov 23 2019, 05:36:56)  [GCC 8.3.0]

Qlib version: 0.8.5
numpy==1.22.4
pandas==1.4.2
scipy==1.8.1
requests==2.28.0
sacred==0.8.2
python-socketio==5.6.0
redis==4.3.3
python-redis-lock==3.7.0
schedule==1.1.0
cvxpy==1.2.1
hyperopt==0.1.2
fire==0.4.0
statsmodels==0.13.2
xlrd==2.0.1
plotly==5.8.1
matplotlib==3.5.2
tables==3.7.0
pyyaml==6.0
mlflow==1.26.1
tqdm==4.64.0
loguru==0.6.0
lightgbm==3.3.2
tornado==6.1
joblib==1.1.0
fire==0.4.0
ruamel.yaml==0.17.21
SunsetWolf commented 2 years ago

I think I have found the cause of this problem and if I add the delete_old parameter when calling the qlib_data method in this line of code, it will be solved. Would you like to be a contributor to this community by pull request to solve this issue.