probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
511 stars 87 forks source link

[WhoScored] TypeError in DateTime Comparison, DateTime no longer tz-aware #599

Closed ds-oliver closed 2 weeks ago

ds-oliver commented 1 month ago

Issue: TypeError in DateTime Comparison After Update

Description

After updating the soccerdata repository, I encountered a TypeError when comparing datetime objects in my script. The error suggests an invalid comparison between tz-naive and tz-aware datetime-like objects. This issue did not occur before the update.

Error Traceback

Traceback (most recent call last):
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/arrays/datetimelike.py", line 536, in _validate_comparison_value
    self._check_compatible_with(other)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 540, in _check_compatible_with
    self._assert_tzawareness_compat(other)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 786, in _assert_tzawareness_compat
    raise TypeError(
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/hogan/soccerdata/scrape_epl.py", line 407, in <module>
    main()
  File "/Users/hogan/soccerdata/scrape_epl.py", line 299, in main
    schedule = schedule[schedule["date"] <= datetime.now()]
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/arraylike.py", line 52, in __le__
    return self._cmp_method(other, operator.le)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/series.py", line 6119, in _cmp_method
    res_values = ops.comparison_op(lvalues, rvalues, op)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/ops/array_ops.py", line 330, in comparison_op
    res_values = op(lvalues, rvalues)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/arraylike.py", line 52, in __le__
    return self._cmp_method(other, operator.le)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/arrays/datetimelike.py", line 985, in _cmp_method
    return invalid_comparison(self, other, op)
  File "/Users/hogan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/ops/invalid.py", line 40, in invalid_comparison
    raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}")
TypeError: Invalid comparison between dtype=datetime64[ns, UTC] and datetime

Steps to Reproduce

  1. Update the soccerdata repository.
  2. Run my script.
  3. Observe the TypeError during the comparison of datetime objects.

Expected Behavior

The script should successfully compare datetime objects without raising a TypeError.

Environment

Additional Information

Suggested Fix

Ensure the datetime objects being compared are either both tz-naive or both tz-aware. Possible solution is to convert schedule["date"] to tz-naive before comparison.

Thank you for your assistance.


probberechts commented 1 month ago

Although I didn't change this intentionally, I prefer tz-aware datetime objects. Naive datetime objects are Python-only abstractions that don't identify anything in the real world; they're highly error prone and sort of work properly only if used in a limited scope where you work in a specific time zone.

Is it correct that the schedule["date"] <= datetime.now() comparison is in your script? Then I would suggest using datatime.utcnow() instead.