Open JMBurley opened 1 year ago
I will note that I have an incomplete system that gives roughly the right behaviour, but I hope we can find something faster & more maintainable:
if pd.api.types.is_object_dtype(test_series) or pd.api.types.is_datetime64_any_dtype(test_series) \
or (test_series.dtype == 'dbdate') or (test_series.dtype == 'timestamp[ns][pyarrow]'):
# Objects or dt64 or dbdate columns can contain valid dates.
# 'dbdate' & 'timestamp[ns][pyarrow]' are not yet pd.api testable (pd 2.0.0)
# (nb./ to_datetime will work on float/int but generate garbage unless the value is epoch time.
# As I don't have a reliable way to know if numerics are valid dates, ignore them. User should force them to dt)
try:
sample_len = np.min([len(test_series), 50])
pd.to_datetime(test_series.sample(sample_len)) # check date viability on subset
is_datelike = True
except (ValueError, TypeError, OverflowError):
# ValueError: not dt-like, or pd._libs.tslibs.np_datetime.OutOfBoundsDatetime
# TypeError: Cell within series cannot be cast to datetime
# OverflowError: tried to convert something that became larger than a 64 bit int
is_datelike = False
else:
is_datelike = False
PS. Also note that the above code permits strings to be dates if appropriately formatted, which may not be good behaviour. For instance, date methods can't work on a string column. The original intent of the above code was to coerce anything meaningfully datelike to datetime64
Feature Type
[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
Testing types in pandas is incomplete for datelikes.
I would be useful to maintain a
pd.api.types.is_datelike_dtype
that can interpret any datelike (eg. datetime64, pyarrow timestamp, timestamp, dbdate) in the same way as doespd.api.types.is_numeric
for numerics.reproducible example:
gives
Feature Description
Alternative Solutions
is_datetime64_any_dtype
could be extended to be more permissive, but I don't think changing behaviour of an existing testing function is a good idea.Additional Context
No response