transitmatters / mbta-performance

For processing performance data for the data dashboard
MIT License
1 stars 1 forks source link

FloatingPointError: overflow encountered in multiply error #4

Closed devinmatte closed 6 months ago

devinmatte commented 6 months ago

We're encountering this stacktrace on every 4th or so run of the process

Traceback (most recent call last):
  File "/opt/python/lib/python3.12/site-packages/datadog_lambda/wrapper.py", line 232, in __call__
    self.response = self.func(event, context, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/chalice/app.py", line 2264, in wrapped
    return get_response(event)
           ^^^^^^^^^^^^^^^^^^^
  File "/var/task/app.py", line 15, in process_daily_lamp
    lamp.ingest_lamp_data()
  File "/var/task/chalicelib/lamp/ingest.py", line 142, in ingest_lamp_data
    processed_daily_events = ingest_pq_file(pq_df)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/chalicelib/lamp/ingest.py", line 119, in ingest_pq_file
    processed_daily_events = _process_arrival_departure_times(pq_df)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/chalicelib/lamp/ingest.py", line 67, in _process_arrival_departure_times
    pq_df["dep_time"] = pd.to_datetime(pq_df["move_timestamp"], unit="s", utc=True).dt.tz_convert("US/Eastern")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/lib/python3.12/site-packages/pandas/core/tools/datetimes.py", line 1067, in to_datetime
    values = convert_listlike(arg._values, format)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/lib/python3.12/site-packages/pandas/core/tools/datetimes.py", line 407, in _convert_listlike_datetimes
    return _to_datetime_with_unit(arg, unit, name, utc, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/lib/python3.12/site-packages/pandas/core/tools/datetimes.py", line 512, in _to_datetime_with_unit
    arr = cast_from_unit_vectorized(arg, unit=unit)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "conversion.pyx", line 149, in pandas._libs.tslibs.conversion.cast_from_unit_vectorized
  File "/opt/python/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 3360, in round
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
FloatingPointError: overflow encountered in multiply

https://app.datadoghq.com/apm/error-tracking/issue/d08f99ac-f54b-11ee-bc81-da7ad0900002?query=env%3Aprod%20service%3Ambta-performance&from_ts=1712605144474&to_ts=1712691544474&live=true

hamima-halim commented 6 months ago

got it--pandas doesnt like nullable integer columns, opting instead to read them in as floats which introduces numpy-related casting chaos. fix+tests incoming!

hamima-halim commented 6 months ago

Looks like we haven't had an occurrence of this error in 2 days (since https://github.com/transitmatters/mbta-performance/pull/7) was landed. I'll still need to write out some tests for other things but I think the floating point issue is now handled.