microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
14.55k stars 2.53k forks source link

Fix the bug of reading string NA as NaN in the function exists_qlib_data. #1736

Closed OzzyXu closed 1 month ago

OzzyXu commented 5 months ago

Fix the bug of reading string NA as NaN of exists_qlibdata in /qlib/utils/__init_\.py.

Description

Nano Labs Ltd is a new Nasdaq-listing company with the ticker name NA from August 1, 2022. The default na_value list of pd.read_csv includes "NA". Changed the default reading behavior of pd.read_csv in exists_qlib_data by adding keep_default_na=False. Removed two values ("NA" and "NULL") from the default NA list while reading the first column of "all.txt", which normally are all strings.

Motivation and Context

To fix the bug in #1720.

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

  1. Pipeline test: image

  2. Your own tests: image (1) Place the attached file all.txt under \qlib_data\us_data_made\instruments and test with the following code

    
    import qlib
    import pandas as pd
    import sys, site
    from pathlib import Path
    from qlib.utils import exists_qlib_data
    from qlib.constant import REG_US

scripts_dir = Path.cwd().parent.joinpath("scripts") provider_uri = "./qlib_data/us_data_made" # target_dir if not exists_qlib_data(provider_uri): print(f"Qlib data is not found in {provider_uri}") sys.path.append(str(scripts_dir)) from get_data import GetData

GetData().qlib_data(target_dir=provider_uri, region=REG_US)

qlib.init(provider_uri=provider_uri, region=REG_US)



## Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [X] Fix bugs
- [ ] Add new feature
- [ ] Update documentation
OzzyXu commented 5 months ago

@microsoft-github-policy-service agree

OzzyXu commented 4 months ago

@SunsetWolf Hey, can I ask why all tests from sources other than slow failed? Do I need to take care of this? Thank you.