Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
Fix the bug of reading string NA as NaN of exists_qlibdata in /qlib/utils/__init_\.py.
Description
Nano Labs Ltd is a new Nasdaq-listing company with the ticker name NA from August 1, 2022. The default na_value list of pd.read_csv includes "NA". Changed the default reading behavior of pd.read_csv in exists_qlib_data by adding keep_default_na=False. Removed two values ("NA" and "NULL") from the default NA list while reading the first column of "all.txt", which normally are all strings.
Motivation and Context
To fix the bug in #1720.
How Has This Been Tested?
[X] Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
[X] If you are adding a new feature, test on your own test scripts.
Screenshots of Test Results (if appropriate):
Pipeline test:
Your own tests:
Place the attached file all.txt under \qlib_data\us_data_made\instruments and test with the following code
import qlib
import pandas as pd
import sys, site
from pathlib import Path
from qlib.utils import exists_qlib_data
from qlib.constant import REG_US
scripts_dir = Path.cwd().parent.joinpath("scripts")
provider_uri = "./qlib_data/us_data_made" # target_dir
if not exists_qlib_data(provider_uri):
print(f"Qlib data is not found in {provider_uri}")
sys.path.append(str(scripts_dir))
from get_data import GetData
## Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [X] Fix bugs
- [ ] Add new feature
- [ ] Update documentation
Fix the bug of reading string NA as NaN of exists_qlibdata in /qlib/utils/__init_\.py.
Description
Nano Labs Ltd is a new Nasdaq-listing company with the ticker name NA from August 1, 2022. The default na_value list of pd.read_csv includes "NA". Changed the default reading behavior of pd.read_csv in
exists_qlib_data
by addingkeep_default_na=False
. Removed two values ("NA" and "NULL") from the default NA list while reading the first column of "all.txt", which normally are all strings.Motivation and Context
To fix the bug in #1720.
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.Screenshots of Test Results (if appropriate):
Pipeline test:![image](https://github.com/microsoft/qlib/assets/28951750/da850c40-db7d-4ea9-82af-e2bddcb39dfe)
Your own tests:
Place the attached file all.txt under \qlib_data\us_data_made\instruments and test with the following code
scripts_dir = Path.cwd().parent.joinpath("scripts") provider_uri = "./qlib_data/us_data_made" # target_dir if not exists_qlib_data(provider_uri): print(f"Qlib data is not found in {provider_uri}") sys.path.append(str(scripts_dir)) from get_data import GetData
qlib.init(provider_uri=provider_uri, region=REG_US)