marcusvolz / strava_py

Create artistic visualisations with your exercise data (Python version)
MIT License
152 stars 18 forks source link

plot_calendar raise date fomat exception when first activity month is May. #30

Closed Luocy7 closed 1 year ago

Luocy7 commented 1 year ago

When i use plot_calendar parse activities.csv as attached, it raise an error

Traceback (most recent call last):
  File "/scratches/scratch.py", line 9, in <module>
    plot_calendar(activities, year_min=2023, year_max=2023)
  File "/virtualenvs/industry-model-6NsNhrrH-py3.11/lib/python3.11/site-packages/stravavis/plot_calendar.py", line 11, in plot_calendar
    activities['Activity Date'] = pd.to_datetime(activities['Activity Date'])
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/virtualenvs/industry-model-6NsNhrrH-py3.11/lib/python3.11/site-packages/pandas/core/tools/datetimes.py", line 1050, in to_datetime
    values = convert_listlike(arg._values, format)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/virtualenvs/industry-model-6NsNhrrH-py3.11/lib/python3.11/site-packages/pandas/core/tools/datetimes.py", line 453, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/virtualenvs/industry-model-6NsNhrrH-py3.11/lib/python3.11/site-packages/pandas/core/tools/datetimes.py", line 484, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/tslibs/strptime.pyx", line 530, in pandas._libs.tslibs.strptime.array_strptime
  File "pandas/_libs/tslibs/strptime.pyx", line 351, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data "Jun 2, 2023, 11:48:52 AM" doesn't match format "%B %d, %Y, %H:%M:%S %p", at position 2. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

I found that this was caused by pandas to_datetime function being unable to correctly infer the month format from inputs "MAY" and "May".

My head line date field of activities.csv is:

May 23, 2023, 12:40:57 PM
May 27, 2023, 1:24:16 AM
Jun 2, 2023, 11:48:52 AM

The guess_datetime_format function in Pandas can infer the date format as %B %d, %Y, %H:%M:%S %p from the first line and apply it to the following rows.

However, the actually date format is %b %d, %Y, %H:%M:%S %p, and when it comes to the third line, it will raise this exception.

The abbreviation for month May is also may, so pandas get the wrong format, issue

My solution is adopt the exception`s suggest, use:

pd.to_datetime(activities['Activity Date'], format="mixed")

or specified the date format

pd.to_datetime(activities['Activity Date'], format="%b %d, %Y, %H:%M:%S %p")
hugovk commented 1 year ago

Thanks for the report, I've created https://github.com/marcusvolz/strava_py/pull/32 to fix it and included the example activities.csv as a test file to prevent regressions.