hasadna / open-bus-stride-api

MIT License
5 stars 14 forks source link

Missing data #43

Open NoamGaash opened 3 months ago

NoamGaash commented 3 months ago

When querying some dates, is seems like there are missing SIRI rides:

https://open-bus-stride-api.hasadna.org.il/gtfs_rides_agg/group_by?date_from=2024-06-18&date_to=2024-06-19&group_by=operator_ref,gtfs_route_date&exclude_hour_from=23&exclude_hour_to=2

image

OriHoch commented 3 months ago

there are siri rides for those dates, please try to pinpoint the specific problem

$ curl 'https://open-bus-stride-api.hasadna.org.il/siri_rides/list?get_count=true&scheduled_start_time_from=2024-06-18T00%3A00%3A00%2B02%3A00&scheduled_start_time_to=2024-06-19T00%3A00%3A00%2B02%3A00&order_by=id%20asc'
80441
NoamGaash commented 3 months ago

When running:

$ curl https://open-bus-stride-api.hasadna.org.il/gtfs_rides_agg/group_by\?date_from\=2024-06-18\&date_to\=2024-06-19\&group_by\=operator_ref | jq ".[] | .total_actual_rides"

we can see array of zeroes.

Also, when running:

select
    agg.gtfs_route_id,
    agg.gtfs_route_hour,
    agg.num_planned_rides,
    agg.num_actual_rides,
    rt.operator_ref
from gtfs_rides_agg_by_hour agg, gtfs_route rt
where
    agg.gtfs_route_id = rt.id
    and agg.gtfs_route_date >= '2024-06-18 00:00:00+00'
    and agg.gtfs_route_date <= '2024-06-19 00:00:00+00'

the num_actual_rides column is zero.

OriHoch commented 3 months ago

thanks, so gtfs_rides_agg_by_hour is a view, it doesn't actually contain the data it's just an interface to a query, so what I need to know is in which table/column the actual data is missing

this is the definition of the gtfs_rides_agg_by_hour view:

SELECT gtfs_ride.gtfs_route_id,
       date_trunc('hour'::text, gtfs_ride.start_time)                     AS gtfs_route_hour,
       date_trunc('day'::text, gtfs_ride.start_time)                      AS gtfs_route_date,
       count(*)                                                           AS num_planned_rides,
       count(*) FILTER (WHERE sr.scheduled_time_gtfs_ride_id IS NOT NULL) AS num_actual_rides
FROM gtfs_ride
         LEFT JOIN siri_ride sr ON gtfs_ride.id = sr.scheduled_time_gtfs_ride_id
GROUP BY gtfs_ride.gtfs_route_id, (date_trunc('hour'::text, gtfs_ride.start_time)),
         (date_trunc('day'::text, gtfs_ride.start_time))