man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.23k stars 79 forks source link

Mixing pd.Timestamp, pd.NaT and timezone aware pd.Timestamp throws a misleading exception #1652

Open vasil-pashov opened 1 week ago

vasil-pashov commented 1 week ago

Describe the bug

When pd.Timestamp, pd.NaT and timezone aware pd.Timestamp are used in the same column write throws a misleading exception. It appears to be working if any two appear in the same column e.g. (nat and timestamp, timestamp and timezone aware timestamp, etc...).

[2024-06-27 12:01:28.560] [arcticdb] [error] Error while normalizing symbol=test, data=                         col
0        2017-01-01 00:00:00
1                        NaT
2  2017-12-15 19:02:35-08:00, metadata=None, 'float' object cannot be interpreted as an integer
Traceback (most recent call last):
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_store.py", line 344, in _try_normalize
    item, norm_meta = self._normalizer.normalize(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_normalization.py", line 1242, in normalize
    return self._normalize(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_normalization.py", line 1189, in _normalize
    return normalizer(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_normalization.py", line 876, in normalize
    columns, column_vals = _normalize_columns(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_normalization.py", line 493, in _normalize_columns
    column_vals = [
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_normalization.py", line 494, in <listcomp>
    _to_primitive(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_normalization.py", line 234, in _to_primitive
    return arr.astype(DTN64_DTYPE)
TypeError: 'float' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\library.py", line 455, in write
    return self._nvs.write(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_store.py", line 569, in write
    udm, item, norm_meta = self._try_normalize(
  File "C:\Users\vasil\Documents\arcticdb_venv_symlink\lib\site-packages\arcticdb\version_store\_store.py", line 366, in _try_normalize
    raise ArcticNativeException(str(ex))
arcticdb_ext.exceptions.ArcticException: 'float' object cannot be interpreted as an integer

Steps/Code to Reproduce

import arcticdb as adb
import numpy as np
import pandas as pd

ac = adb.Arctic("lmdb://test")
lib = ac.get_library("test", create_if_missing=True)
dates = [pd.Timestamp('2017-01-01'), pd.NaT, pd.Timestamp(1513393355, unit='s', tz='US/Pacific')]
df = pd.DataFrame({"col": dates})
lib.write("test", df)

Expected Results

Either make it possible to write the column or throw a reasonable exception.

OS, Python Version and ArcticDB Version

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] OS: Windows-10-10.0.22631-SP0 ArcticDB: 4.4.3rc1

Backend storage used

No response

Additional Context

No response