If you have a dataframe with Pandas' nullable integer as one of the column datatypes, and a row includes a pd.NA value, you get the following traceback:
Traceback (most recent call last):
write_api.write(
File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
return self._write_batching(bucket, org, record,
File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
serializer.serialize(chunk_idx),
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
return list(lp)
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
return any(map(lambda x: _not_nan(p[x]), indexes))
File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
However, if your change your column datatype to a float (which has a native NaN encoding), it works
Code sample to reproduce problem
import pandas as pd
df = pd.DataFrame({"x": [1, pd.NA], "time": [0, 1]}).astype({"x": "Int64"})
with get_client() as client:
with client.write_api() as write_api:
write_api.write(BUCKET, record=df, data_frame_measurement_name="test", data_frame_timestamp_column="time")
Expected behavior
I would anticipate that this behaves the same as if it were a float. My current work around is to use floats.
If the code is too complicated to fix/would incur significant slowdown for other users, I think at minimum, raising a cleaner exception would be reasonable.
Actual behavior
I get an exception:
Traceback (most recent call last):
write_api.write(
File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
return self._write_batching(bucket, org, record,
File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
serializer.serialize(chunk_idx),
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
return list(lp)
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
return any(map(lambda x: _not_nan(p[x]), indexes))
File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
Additional info
My knee-jerk reaction is I saw is in client/write/dataframe_serializer.py, there is a function:
def _not_nan(x):
return x == x
which I think can just be
def _not_nan(x):
from ...extras import pd
return pd.isna(x)
However, I saw this block of code:
if null_columns[index]:
key_value = f"""{{
'' if {val_format} == '' or type({val_format}) == float and math.isnan({val_format}) else
f',{key_format}={{str({val_format}).translate(_ESCAPE_STRING)}}'
}}"""
which looks pretty crazy, and I am not sure how the data would look at that point?
Specifications
If you have a dataframe with Pandas' nullable integer as one of the column datatypes, and a row includes a
pd.NA
value, you get the following traceback:However, if your change your column datatype to a float (which has a native NaN encoding), it works
Code sample to reproduce problem
Expected behavior
I would anticipate that this behaves the same as if it were a float. My current work around is to use floats.
If the code is too complicated to fix/would incur significant slowdown for other users, I think at minimum, raising a cleaner exception would be reasonable.
Actual behavior
I get an exception:
Additional info
My knee-jerk reaction is I saw is in
client/write/dataframe_serializer.py
, there is a function:which I think can just be
However, I saw this block of code:
which looks pretty crazy, and I am not sure how the data would look at that point?