ynqa / pandavro

Apache Avro <-> pandas DataFrame
MIT License
134 stars 32 forks source link

from_records() got an unexpected keyword argument 'na_dtypes' #40

Open JeroenDedimo opened 2 years ago

JeroenDedimo commented 2 years ago

Hi,

First of all: many thanks for pandavro! It's incredibly useful in day-to-day data operations.

When using read_avro() with na_dtypes=True, I get the following TypeError, using Pandas 1.3.5:

from_records() got an unexpected keyword argument 'na_dtypes'

Will post full trace below.

I'd like to humbly request if this is a know issue and if there is a workaround. If it is a new issue, I'm willing to help fix it. Any pointers to get started are deeply appreciated.

Full command:

df = pdx.read_avro('./test.avro', na_dtypes=True)

Full trace:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_4020/1232288353.py in <module>
----> 1 df = pdx.read_avro('./test.avro', na_dtypes=True)

/opt/conda/lib/python3.7/site-packages/pandavro/__init__.py in read_avro(file_path_or_buffer, schema, **kwargs)
    194     if isinstance(file_path_or_buffer, six.string_types):
    195         with open(file_path_or_buffer, 'rb') as f:
--> 196             return __file_to_dataframe(f, schema, **kwargs)
    197     else:
    198         return __file_to_dataframe(file_path_or_buffer, schema, **kwargs)

/opt/conda/lib/python3.7/site-packages/pandavro/__init__.py in __file_to_dataframe(f, schema, **kwargs)
    177 def __file_to_dataframe(f, schema, **kwargs):
    178     reader = fastavro.reader(f, reader_schema=schema)
--> 179     return pd.DataFrame.from_records(list(reader), **kwargs)
    180 
    181 

TypeError: from_records() got an unexpected keyword argument 'na_dtypes'
jbvaningen commented 1 year ago

What version of pandavro are you using?

I suspect you are using a new argument in an older version of pandavro.

The na_dtypes argument was introduced in pandavro==1.7.0 and later. If you installed an older version, but you try to call it with na_dtypes as a kwarg, then it will be passed on to pandas, which does not expect it.

JeroenDedimo commented 1 year ago

Hi Joost,

Thanks so much! In the mean time I've implemented a workaround, but will take a look to see if I can upgrade pandavro on our environment.

Best regards, Jeroen

Jeroen Kroesen Hoofd Data Science en Analytics Dedimo Research and Development +31 6 290 44 3 44 @.*** Out of office on Thursdays

Op ma 28 nov. 2022 om 12:14 schreef Joost @.***>:

What version of pandavro are you using?

I suspect you are using a new argument in an older version of pandavro.

The na_dtypes argument was introduced in pandavro==1.7.0 and later. If you installed an older version, but you try to call it with na_dtypes as a kwarg, then it will be passed on to pandas, which does not expect it.

— Reply to this email directly, view it on GitHub https://github.com/ynqa/pandavro/issues/40#issuecomment-1328904420, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARREQVX54LSMEDFQUN4ALZLWKSHZ3ANCNFSM6AAAAAARE74ZKI . You are receiving this because you authored the thread.Message ID: @.***>