man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.52k stars 93 forks source link

Numpy arrays should not allow inputs of different type #1214

Open vasil-pashov opened 10 months ago

vasil-pashov commented 10 months ago

Describe the bug

Creating a symbol with column of array type containing arrays with different dtype succeeds and then prints out garbage.

Steps/Code to Reproduce

from arcticdb import Arctic
import pandas as pd
import numpy as np
ac = Arctic("lmdb://test")
lib = ac.get_library("arrays", create_if_missing=True)
data = pd.DataFrame({"array_col": [np.array([1,2,3]), np.array([6]).astype("float32")], "other_col": [1,2]})
lib.write("test", data)
print(lib.read("test").data)

      array_col  other_col
0     [1, 2, 3]          1
1  [1086324736]          2

Expected Results

  1. Throw an exception
  2. Do not create the symbol

OS, Python Version and ArcticDB Version

Dev build: da5f02b0f2edc842413bf46aeb0a30bd9f319dda

Backend storage used

No response

Additional Context

No response

jamesmunro commented 3 months ago

On arcticdb==4.5.0 and probably before this produces an error, which is an improvement.

---------------------------------------------------------------------------
NormalizationException                    Traceback (most recent call last)
[<ipython-input-11-8b826992605c>](https://localhost:8080/#) in <cell line: 7>()
      5 lib = ac.get_library("arrays", create_if_missing=True)
      6 data = pd.DataFrame({"array_col": [np.array([1,2,3]), np.array([6]).astype("float32")], "other_col": [1,2]})
----> 7 lib.write("test", data)
      8 print(lib.read("test").data)

1 frames
[/usr/local/lib/python3.10/dist-packages/arcticdb/version_store/library.py](https://localhost:8080/#) in write(self, symbol, data, metadata, prune_previous_versions, staged, validate_index)
    460             )
    461 
--> 462         return self._nvs.write(
    463             symbol=symbol,
    464             data=data,

[/usr/local/lib/python3.10/dist-packages/arcticdb/version_store/_store.py](https://localhost:8080/#) in write(self, symbol, data, metadata, prune_previous_version, pickle_on_failure, validate_index, **kwargs)
    592                 return None
    593             else:
--> 594                 vit = self.version_store.write_versioned_dataframe(
    595                     symbol, item, norm_meta, udm, prune_previous_version, sparsify_floats, validate_index
    596                 )

NormalizationException: E_UNIMPLEMENTED_INPUT_TYPE Array types are not supported at the moment