microsoft / qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
https://qlib.readthedocs.io/en/latest/
MIT License
14.54k stars 2.53k forks source link

numpy.datetime64 precision cause dict indexing failure in index_data.py #1806

Open GeorgeGuo1202 opened 3 weeks ago

GeorgeGuo1202 commented 3 weeks ago

πŸ› Bug Description

ns precision numpy.datetime64 is equal to second precision however when doing indexing of a dict keys, it will cause failure

numpy.datetime64('2017-01-04T00:00:00.000000000')==numpy.datetime64('2017-01-04T00:00:00') Out[24]: True self.index_map[numpy.datetime64('2017-01-04T00:00:00.000000000')] Out[25]: 1 self.index_map[numpy.datetime64('2017-01-04T00:00:00')] Traceback (most recent call last): File "F:\work\env\py311\Lib\site-packages\IPython\core\interactiveshell.py", line 3508, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in self.index_map[numpy.datetime64('2017-01-04T00:00:00')]


KeyError: numpy.datetime64('2017-01-04T00:00:00')

it occurs at qlib/qlib/utils/index_data.py, line 157:
        try:
            return self.index_map[self._convert_type(item)]
        except IndexError as index_e:
            raise KeyError(f"{item} can't be found in {self}") from index_e

maybe I did something wrong, hoping for help
SunsetWolf commented 2 weeks ago

Can you describe, in detail, how this can be surfaced? This will help us to solve the problem.

akazeakari commented 2 weeks ago

Here is a code reproduction based on the issue description. Although the two numpy.datetime64 values are equal, the difference in precision causes a dict indexing failure.

The following code successfully reproduces the issue on Python 3.9.19, pyqlib 0.9.5.99, and numpy 1.23.5.

import numpy as np
from qlib.utils.index_data import Index, SingleData

index = Index([np.datetime64('2017-01-04T00:00:00.000000000'),
               np.datetime64('2017-01-05T00:00:00.000000000'),
               np.datetime64('2017-01-06T00:00:00.000000000')])

data = SingleData([1, 2, 3], index=index)

# print: True
print(np.datetime64('2017-01-04T00:00:00.000000000') == np.datetime64('2017-01-04T00:00:00'))

# print: 0
print(data.index.index_map[np.datetime64('2017-01-04T00:00:00.000000000')])

# False
print(data.index.index_map[np.datetime64('2017-01-04T00:00:00')])

# print: 0
print(data.index.index_map[np.datetime64(np.datetime64('2017-01-04T00:00:00'), 'ns')])