API/DOC: types covered by lib.infer_dtype

h-vetinari commented 5 years ago

In #23167, I'm trying consistently infer the dtype of the underlying Series/Index while calling the constructor of the .str-accessor. For testing this thoroughly, I wanted to build a parametrized fixture that returns an ndarray for all the dtypes that lib.infer_dtype can infer. I based myself on the list in the docstring, but found the following:

the docstring mentions 'complex' as a possible outcome, but this does not work (instead returning 'mixed')
```
>>> lib.infer_dtype([1+1j, 2+2j])
'mixed'
>>> lib.infer_dtype([np.complex128(1+1j)])
'mixed'
```
and I don't believe it's actually possible to achieve this, given the code.
the docstring mentions 'timedelta64', but this similarly does not work (returning 'timedelta'; and can't be hit either, IMO)
```
>>> lib.infer_dtype([np.timedelta64(1, 'D')])
'timedelta'
```

the docstring does not mention 'interval', but that is a possible outcome:

>>> lib.infer_dtype([pd.Interval(0, 1), pd.Interval(0, 2)])
'interval'

So it needs to be discussed if 'complex'/'timedelta64' should be added to the code or removed from the docstring, and vice versa for 'interval'.

h-vetinari commented 5 years ago

@TomAugspurger Pinging you here as this is related to #23553 and may be related to #23581

TomAugspurger commented 5 years ago

I haven't been able to wrap my head around infer_dtype yet, but adding interval seems fine.

but this similarly does not work (returning 'timedelta'; and can't be hit either, IMO)


In [25]: lib.infer_dtype(list(np.array([1], dtype='timedelta64[ns]'))[0])
Out[25]: 'timedelta64'

In [26]: lib.infer_dtype(list(np.array([1], dtype='timedelta64[ns]')))
Out[26]: 'timedelta'

Out[26] looks like a bug.

FWIW, rewriting the Series constructor is on my medium-term todo list, but not before 0.24 is done. Part of that would be a cleanup of infer_dtype.

h-vetinari commented 5 years ago

Discovered a relevant usage of the timedelta/timedelta64 case: pandas.core.dtypes.cast.maybe_downcast_to_dtype explicitly tests the inferred dtype for equality to 'timedelta64', but not 'timedelta'.

pandas-dev / pandas

API/DOC: types covered by lib.infer_dtype #23554