Open hodossy opened 6 years ago
I have a temporary solution until is is fixed:
class NativeDict(dict):
"""
Helper class to ensure that only native types are in the dicts produced by
:func:`to_dict() <pandas.DataFrame.to_dict>`
.. note::
Needed until `#21256 <https://github.com/pandas-dev/pandas/issues/21256>`_ is resolved.
"""
def __init__(self, *args, **kwargs):
super().__init__(((k, self.convert_if_needed(v)) for row in args for k, v in row), **kwargs)
@staticmethod
def convert_if_needed(value):
"""
Converts `value` to native python type.
.. warning::
Only :class:`Timestamp <pandas.Timestamp>` and numpy :class:`dtypes <numpy.dtype>` are converted.
"""
if pd.isnull(value):
return None
if isinstance(value, pd.Timestamp):
return value.to_pydatetime()
if hasattr(value, 'dtype'):
mapper = {'i': int, 'u': int, 'f': float}
_type = mapper.get(value.dtype.kind, lambda x: x)
return _type(value)
return value
This also replaces NaN
and NaT
objects with native python None
. Please note that it only intended use is to convert into, I have not tested elsewhere. It can be used like so:
df.to_dict(orient='records', into=NativeDict)
This is fixed on 1.2 master. Running the OP:
In [3]: import pandas as pd
...: from datetime import datetime
...:
...: dfs = {
...: 'full_df': pd.DataFrame([
...: {'int': 1, 'date': datetime.now(), 'str': 'foo', 'float': 1.0, 'bool': True},
...: ]),
...: 'int_df': pd.DataFrame([
...: {'int': 1},
...: ]),
...: 'date_df': pd.DataFrame([
...: {'date': datetime.now()},
...: ]),
...: 'str_df': pd.DataFrame([
...: {'str': 'foo'},
...: ]),
...: 'float_df': pd.DataFrame([
...: {'float': 1.0},
...: ]),
...: 'bool_df': pd.DataFrame([
...: {'bool': True},
...: ])
...: }
...:
...: for name, frame in dfs.items():
...: print('Types in ' + name)
...: for k, v in frame.to_dict('records')[0].items():
...: print(type(v))
...:
Types in full_df
<class 'int'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'str'>
<class 'float'>
<class 'bool'>
Types in int_df
<class 'int'>
Types in date_df
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Types in str_df
<class 'str'>
Types in float_df
<class 'float'>
Types in bool_df
<class 'bool'>
Hello! Thanks for fixing the integers, but it seems that date types are still using the internal type. Would it be possible to convert them to native type as well?
Do we want to reopen this?
xref https://github.com/pandas-dev/pandas/pull/37648#discussion_r571652150 I think we're not gonna act here but it does keep coming up
Code to reproduce the error:
Output:
Problem description
One would expect that the
to_dict()
function returns python native types, or at least does the same to the same type of columns, however it behaves differently as shown above. It seems that type conversion is not invoked when a single column is present in the dataframe.Expected Output
Python native types where it is possible for
int
,float
,bool
andstr
types, and if possible, a pythondatetime
object instead ofpandas.Timestamp
Output of
pd.show_versions()