pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.43k stars 17.85k forks source link

API/DOC: types covered by lib.infer_dtype #23554

Open h-vetinari opened 5 years ago

h-vetinari commented 5 years ago

In #23167, I'm trying consistently infer the dtype of the underlying Series/Index while calling the constructor of the .str-accessor. For testing this thoroughly, I wanted to build a parametrized fixture that returns an ndarray for all the dtypes that lib.infer_dtype can infer. I based myself on the list in the docstring, but found the following:

So it needs to be discussed if 'complex'/'timedelta64' should be added to the code or removed from the docstring, and vice versa for 'interval'.

h-vetinari commented 5 years ago

@TomAugspurger Pinging you here as this is related to #23553 and may be related to #23581

TomAugspurger commented 5 years ago

I haven't been able to wrap my head around infer_dtype yet, but adding interval seems fine.

but this similarly does not work (returning 'timedelta'; and can't be hit either, IMO)


In [25]: lib.infer_dtype(list(np.array([1], dtype='timedelta64[ns]'))[0])
Out[25]: 'timedelta64'

In [26]: lib.infer_dtype(list(np.array([1], dtype='timedelta64[ns]')))
Out[26]: 'timedelta'

Out[26] looks like a bug.

FWIW, rewriting the Series constructor is on my medium-term todo list, but not before 0.24 is done. Part of that would be a cleanup of infer_dtype.

h-vetinari commented 5 years ago

Discovered a relevant usage of the timedelta/timedelta64 case: pandas.core.dtypes.cast.maybe_downcast_to_dtype explicitly tests the inferred dtype for equality to 'timedelta64', but not 'timedelta'.