hasadna / open-bus

:bus: Analysing Israel's public transport data
93 stars 29 forks source link

investigate error in stride gtfs-etl analyze step: TypeError: sequence item 0: expected str instance, float found #351

Closed OriHoch closed 2 years ago

OriHoch commented 2 years ago

https://open-bus-airflow.hasadna.org.il/log?dag_id=gtfs-etl&task_id=analyze&execution_date=2021-10-19T00%3A00%3A00%2B00%3A00

[2021-10-20 00:00:45,356] {bash.py:173} INFO - 2021-10-20 00:00:45,356 - root - INFO - analyzing GTFS files from archive folder: /var/gtfs-storage/gtfs_archive/2021/10/20/.gtfs_metadata.json and save analyzed data in /var/gtfs-storage/stat_archive/2021/10/20
[2021-10-20 00:00:45,357] {bash.py:173} INFO - 2021-10-20 00:00:45,357 - root - INFO - analyze gtfs stat - this could take some time
[2021-10-20 00:04:28,326] {bash.py:173} INFO - Traceback (most recent call last):
[2021-10-20 00:04:28,327] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply
[2021-10-20 00:04:28,337] {bash.py:173} INFO -     result = self._python_apply_general(f, self._selected_obj)
[2021-10-20 00:04:28,337] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
[2021-10-20 00:04:28,338] {bash.py:173} INFO -     keys, values, mutated = self.grouper.apply(f, data, self.axis)
[2021-10-20 00:04:28,338] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 783, in apply
[2021-10-20 00:04:28,339] {bash.py:173} INFO -     result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
[2021-10-20 00:04:28,339] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
[2021-10-20 00:04:28,340] {bash.py:173} INFO -     return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
[2021-10-20 00:04:28,340] {bash.py:173} INFO -   File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
[2021-10-20 00:04:28,340] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/gtfs_stat/aggregations.py", line 58, in trip_stats_aggregation
[2021-10-20 00:04:28,341] {bash.py:173} INFO -     d[f'all_{key}'] = ';'.join(group[key].tolist())
[2021-10-20 00:04:28,341] {bash.py:173} INFO - TypeError: sequence item 0: expected str instance, float found
[2021-10-20 00:04:28,341] {bash.py:173} INFO - 
[2021-10-20 00:04:28,341] {bash.py:173} INFO - During handling of the above exception, another exception occurred:
[2021-10-20 00:04:28,341] {bash.py:173} INFO - 
[2021-10-20 00:04:28,341] {bash.py:173} INFO - Traceback (most recent call last):
[2021-10-20 00:04:28,341] {bash.py:173} INFO -   File "/srv/open_bus_pipelines/operators/_api_bash_operator_script.py", line 27, in <module>
[2021-10-20 00:04:28,341] {bash.py:173} INFO -     main(*sys.argv[1:])
[2021-10-20 00:04:28,341] {bash.py:173} INFO -   File "/srv/open_bus_pipelines/operators/_api_bash_operator_script.py", line 18, in main
[2021-10-20 00:04:28,341] {bash.py:173} INFO -     function.callback(**kwargs)
[2021-10-20 00:04:28,341] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/cli.py", line 33, in analyze
[2021-10-20 00:04:28,341] {bash.py:173} INFO -     api.analyze_gtfs_stat_into_archive_folder(**kwargs)
[2021-10-20 00:04:28,341] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/api.py", line 64, in analyze_gtfs_stat_into_archive_folder
[2021-10-20 00:04:28,341] {bash.py:173} INFO -     analyze_gtfs_stat(date_to_analyze=date, gtfs_metadata_file=gtfs_metadata,
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/api.py", line 103, in analyze_gtfs_stat
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     trip_stats, route_stats = create_trip_and_route_stat(date_to_analyze, gtfs_files)
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/gtfs_stat/gtfs_stats.py", line 51, in create_trip_and_route_stat
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     trip_stats, route_stats = analyze_gtfs_date(
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/gtfs_stat/gtfs_stats.py", line 32, in analyze_gtfs_date
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     trip_stats = compute_trip_stats(feed, zones, clusters, trip_id_to_date_df, date_to_analyze)
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/gtfs_stat/core_computations.py", line 253, in compute_trip_stats
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     h = g.apply(aggregation)
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1264, in apply
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     return self._python_apply_general(f, self._selected_obj)
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     keys, values, mutated = self.grouper.apply(f, data, self.axis)
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 783, in apply
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
[2021-10-20 00:04:28,342] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 1328, in fast_apply
[2021-10-20 00:04:28,342] {bash.py:173} INFO -     return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
[2021-10-20 00:04:28,343] {bash.py:173} INFO -   File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
[2021-10-20 00:04:28,343] {bash.py:173} INFO -   File "/usr/local/lib/stride/lib/python3.8/site-packages/open_bus_gtfs_etl/gtfs_stat/aggregations.py", line 58, in trip_stats_aggregation
[2021-10-20 00:04:28,343] {bash.py:173} INFO -     d[f'all_{key}'] = ';'.join(group[key].tolist())
[2021-10-20 00:04:28,343] {bash.py:173} INFO - TypeError: sequence item 0: expected str instance, float found
[2021-10-20 00:04:30,370] {bash.py:177} INFO - Command exited with return code 1
AvivSela commented 2 years ago

My last pr fix it too. רכבת ישראל started to pass nan value for some fields.

OriHoch commented 2 years ago

https://github.com/hasadna/open-bus-gtfs-etl/pull/12

great, thanks, please mention it on the PR

keeping issue open until the PR is merged

AvivSela commented 2 years ago

i see this one already merged. im closing the issue. if it happen again, let's open it.

OriHoch commented 2 years ago

fixed + merged in this PR: https://github.com/hasadna/open-bus-gtfs-etl/pull/13