When you attempt to perform an update (and also an append) using an input dataframe that we cannot normalize, then you receive a misleading error message. We also fail to properly explain why the normalization failed.
There is no point continuing once the normalization fails. We know that appends and updates will not work with a pickled object. We should bail out early with a helpful error message.
This causes an ArcticException with this logging and traceback:
In [49]: lib.update("ts", upd)
[2024-02-01 17:20:16.317] [arcticdb] [error] Could not normalize item of type: <class 'pandas.core.frame.DataFrame'> with any normalizer.You can set pickle_on_failure param to force pickling of this object instead.(Note: Pickling has worse performance and stricter memory limitations)
[2024-02-01 17:20:16.319] [arcticdb] [error] Error while normalizing symbol=ts, data= a
2024-01-01 2023-01-01 00:00:00
2024-01-02 [1, 2, 3], metadata=None, Could not convert object to NumPy datetime
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_store.py:334, in NativeVersionStore._try_normalize(self, symbol, dataframe, metadata, pickle_on_failure, dynamic_strings, coerce_columns, **kwargs)
332 else:
333 # TODO: just for pandas dataframes for now.
--> 334 item, norm_meta = self._normalizer.normalize(
335 dataframe,
336 pickle_on_failure=pickle_on_failure,
337 dynamic_strings=dynamic_strings,
338 coerce_columns=coerce_columns,
339 dynamic_schema=dynamic_schema,
340 **kwargs,
341 )
342 except ArcticDbNotYetImplemented as ex:
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_normalization.py:1201, in CompositeNormalizer.normalize(self, item, string_max_len, pickle_on_failure, dynamic_strings, coerce_columns, **kwargs)
1200 try:
-> 1201 return self._normalize(
1202 item,
1203 string_max_len=string_max_len,
1204 dynamic_strings=dynamic_strings,
1205 coerce_columns=coerce_columns,
1206 **kwargs,
1207 )
1208 except Exception as ex:
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_normalization.py:1148, in CompositeNormalizer._normalize(self, item, string_max_len, dynamic_strings, coerce_columns, **kwargs)
1147 log.debug("Normalizer used: {}".format(normalizer))
-> 1148 return normalizer(
1149 item,
1150 string_max_len=string_max_len,
1151 dynamic_strings=dynamic_strings,
1152 coerce_columns=coerce_columns,
1153 **kwargs,
1154 )
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_normalization.py:838, in DataFrameNormalizer.normalize(self, item, string_max_len, dynamic_strings, coerce_columns, **kwargs)
837 columns_vals = [item.iloc[:, idx].values for idx in range(len(item.columns))]
--> 838 columns, column_vals = _normalize_columns(
839 item.columns,
840 columns_vals,
841 norm_meta.df,
842 coerce_columns=coerce_columns,
843 dynamic_strings=dynamic_strings,
844 string_max_len=string_max_len,
845 dynamic_schema=kwargs.get("dynamic_schema", False),
846 index_names=index_names,
847 )
848 if item.columns.name is not None:
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_normalization.py:467, in _normalize_columns(columns_names, columns_vals, norm_meta, coerce_columns, dynamic_strings, string_max_len, dynamic_schema, index_names)
462 raise ArcticNativeException(
463 "mismatch in columns_name and vals size in _normalize_columns {} != {}".format(
464 len(columns_names_norm), len(columns_vals)
465 )
466 )
--> 467 column_vals = [
468 _to_primitive(
469 columns_vals[idx],
470 columns_names_norm[idx],
471 string_max_len=string_max_len,
472 dynamic_strings=dynamic_strings,
473 coerce_column_type=coerce_columns[str(columns_names[idx])] if coerce_columns else None,
474 norm_meta=norm_meta,
475 )
476 for idx in range(len(columns_names_norm))
477 ]
478 return columns_names_norm, column_vals
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_normalization.py:468, in <listcomp>(.0)
462 raise ArcticNativeException(
463 "mismatch in columns_name and vals size in _normalize_columns {} != {}".format(
464 len(columns_names_norm), len(columns_vals)
465 )
466 )
467 column_vals = [
--> 468 _to_primitive(
469 columns_vals[idx],
470 columns_names_norm[idx],
471 string_max_len=string_max_len,
472 dynamic_strings=dynamic_strings,
473 coerce_column_type=coerce_columns[str(columns_names[idx])] if coerce_columns else None,
474 norm_meta=norm_meta,
475 )
476 for idx in range(len(columns_names_norm))
477 ]
478 return columns_names_norm, column_vals
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_normalization.py:234, in _to_primitive(arr, arr_name, dynamic_strings, string_max_len, coerce_column_type, norm_meta)
233 log.debug("Removing all NaNs from column: {} of type datetime64", arr_name)
--> 234 return arr.astype(DTN64_DTYPE)
235 elif _accept_array_string(sample):
ValueError: Could not convert object to NumPy datetime
During handling of the above exception, another exception occurred:
ArcticException Traceback (most recent call last)
Cell In[49], line 1
----> 1 lib.update("ts", upd)
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/library.py:826, in Library.update(self, symbol, data, metadata, upsert, date_range, prune_previous_versions)
759 def update(
760 self,
761 symbol: str,
(...)
766 prune_previous_versions=False,
767 ) -> VersionedItem:
768 """
769 Overwrites existing symbol data with the contents of ``data``. The entire range between the first and last index
770 entry in ``data`` is replaced in its entirety with the contents of ``data``, adding additional index entries if
(...)
824 2018-01-04 4
825 """
--> 826 return self._nvs.update(
827 symbol=symbol,
828 data=data,
829 metadata=metadata,
830 upsert=upsert,
831 date_range=date_range,
832 prune_previous_version=prune_previous_versions,
833 )
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_store.py:790, in NativeVersionStore.update(self, symbol, data, metadata, date_range, upsert, prune_previous_version, **kwargs)
786 data = restrict_data_to_date_range_only(data, start=start, end=end)
788 _handle_categorical_columns(symbol, data)
--> 790 udm, item, norm_meta = self._try_normalize(symbol, data, metadata, False, dynamic_strings, coerce_columns)
792 if isinstance(item, NPDDataFrame):
793 with _diff_long_stream_descriptor_mismatch(self):
File ~/venvs/310/lib/python3.10/site-packages/arcticdb/version_store/_store.py:347, in NativeVersionStore._try_normalize(self, symbol, dataframe, metadata, pickle_on_failure, dynamic_strings, coerce_columns, **kwargs)
345 except Exception as ex:
346 log.error("Error while normalizing symbol={}, data={}, metadata={}, {}", symbol, dataframe, metadata, ex)
--> 347 raise ArcticNativeException(str(ex))
349 if norm_meta is None:
350 raise ArcticNativeException("Cannot normalize input {}".format(symbol))
ArcticException: Could not convert object to NumPy datetime
Expected Results
[2024-02-01 17:20:16.317] [arcticdb] [error] Could not normalize item of type: <class 'pandas.core.frame.DataFrame'> with any normalizer.You can set pickle_on_failure param to force pickling of this object instead.(Note: Pickling has worse performance and stricter memory limitations)
is misleading for two reasons:
There is no such option on the Arctic API (it refers to the old v1 API used in Man Group)
Even if you did set an option to pickle it, it would not help you here because you are trying to use it in an update
We should also explain better why the normalization failed - down to which column is at fault.
Describe the bug
When you attempt to perform an
update
(and also anappend
) using an input dataframe that we cannot normalize, then you receive a misleading error message. We also fail to properly explain why the normalization failed.There is no point continuing once the normalization fails. We know that appends and updates will not work with a pickled object. We should bail out early with a helpful error message.
Steps/Code to Reproduce
This causes an
ArcticException
with this logging and traceback:Expected Results
is misleading for two reasons:
We should also explain better why the normalization failed - down to which column is at fault.
OS, Python Version and ArcticDB Version
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] OS: Linux-6.5.0-14-generic-x86_64-with-glibc2.35 ArcticDB: 4.2.1
Backend storage used
LMDB
Additional Context
cf internal thread https://chat-man.slack.com/archives/CKD4V6N0H/p1706808901627019?thread_ts=1706791167.627659&cid=CKD4V6N0H