Open AbdouSeck opened 6 years ago
Your examples do not raise any errors on master; was this fixed recently @jbrockmendel?
In [8]: pd.__version__
Out[8]: '0.24.0.dev0+780.g145c2275e'
In [9]: df.A == '2013-13-23'
...:
Out[9]:
0 False
1 False
2 False
3 False
Name: A, dtype: bool
In [10]: df.A == ['2013-10-10', '2013-11-30', '2013-10-13', '2013-13-23']
...:
Out[10]:
0 False
1 False
2 False
3 False
Name: A, dtype: bool
I'm not aware of any recent changes that would be relevant.
@mroeschke is that the expected and intended behavior in the next production iteration?
I don't think so. I am not the biggest can of the implicit coercion of these strings to datetimes, but given that it works for matching date strings I think you're right and this should raise a ValueError
since 2013-13-23
cannot be parsed into a datetime.
And if the docs don't mention it already, it would be great to mention equality when comparing datetimes to date-like strings.
I think I'd be OK with deprecating the implicit coercion for both dt64 and Timestamp comparisons.
I can see this being a feature for people that trust that their string formatted dates are easily parsable. But when this implicit coercion of string formatted dates to datetime objects is neither documented nor fully guaranteed to yield the right thing (because
2/1/2018
can be either%d/%m/%Y
or%m/%d/%Y
formatted), it's hard for many to realize what's going on here. This is evidenced by this stackoverflow question.Problem description
There are at least 3 issues here:
The following piece code raises a combination of and
TypeError
andValueError
exceptions due to failure in converting one of the string into a datetime object:Nowhere in that stream of tracebacks is it mentioned that
'2013-13-23'
is the bad data. If this feature is here to stay, it would be nice if the first data value to fail was reported with the raised exceptions.The second issue is one of documentation. For such an opinionated coercion of data, I was hoping to see some documentation about it either under
pd.Series.eq
orpd.DatetimeIndex.__eq__
(sincepd.DatetimeIndex
is what the right side gets converted to before the comparison is made). In fact, it inside the source code forpd.DatetimeIndex.__eq__
that I was able to find the line that carries out the conversion. And, it does look like the code is from the decorator of the comparison methods ofpd.DatetimeIndex
. The following is how I was able to get to it:Neither
pd.Series.eq
norpd.Series.__eq__
norpd.DatetimeIndex.__eq__
seems to mention anything about this implicit type coercion.?
Expected Output
The same way
df.B == '2'
anddf.C == '2.0'
, I was expectingdf.A == '2013-11-23'
to also raise aTypeError
.Output of
pd.show_versions()
I am not familiar with the structure of the repo, but I am happy to drop some lines in the docstring of
pd.Series.eq
to indicate that type coercions can happen.Thank you