r5py / r5py

Rapid Realistic Routing with R5 in Python
https://r5py.readthedocs.io
Other
114 stars 17 forks source link

Departure time is outside of the time range covered by currently loaded GTFS data sets #364

Open keyingtang opened 11 months ago

keyingtang commented 11 months ago

Hello, I have a question in terms of the detailed GTFS data requirements.

Describe the bug I want to use the function TravelTimeMatrixComputer, and for the departure time I typed in one time that I'm sure it's covered in the loaded GTFS data sets. But I got warning saying .../r5py/r5/regional_task.py:228: RuntimeWarning: Departure time 2023-10-07 12:30:00 is outside of the time range covered by currently loaded GTFS data sets. And all the output travel time value are NaN.

The GTFS datasets I provided only has calendar_dates.txt, no calendar.txt. Could this be a reason?

Environment:

christophfink commented 10 months ago

The warning is indicating that a GTFS data set, according to its own metadata, does not cover the requested date and time. Since the result seems to not find any connections, this also seems to be the case. Can you double-check that there are routes in the GTFS data set for the date and time you request?

@wklumpen could you comment on this, because I’m not 100% sure of the requirements of a GTFS file

wklumpen commented 10 months ago

Heya!

We have a known issue (#301) where our warning is not set up to check caldendar_dates.txt explicitly for coverage (which is why it's currently a warning not an error).

I assume though that R5 does include it as that would be a pretty big hole in the software.

Some more things to check:

If you are able to attach the GTFS file I can double check it to make sure that's not the issue.

keyingtang commented 10 months ago

Hello both!

Thank you a lot for your answers! According to your advice, I have done the below checks, but I still got the same warning.

Do you have any further advice on how I can solve this problem? Thanks in advance!

wklumpen commented 10 months ago

Just had a look at the GTFS file, a quick CTRL+F for 20231007 does not show in calendar_dates.txt

calendar_dates.txt

wklumpen commented 10 months ago

Also as a side note can you open an issue for the UnicodeDecode error on the GTFS-Lite repo?

keyingtang commented 10 months ago

Hi, yeah that's true, because the newest gtfs-nl.zip file starts from 20231010, which you can see from feed_info.txt. And the departure date and time I requested is within the coverage indicated by feed_info.txt.

Btw, I have just opened the UnicodeDecode error issue on GTFS-Lite, thanks ;)

feed_info.txt

wklumpen commented 10 months ago

Can you post the calendar.txt and calendar_dates.txt files from the service you are loading into R5py?

The feed_info.txt is only a recommended file and the data may not actually follow what it suggests. From the feed_info description on the feed_start_date:

The dataset provides complete and reliable schedule information for service in the period from the beginning of the feed_start_date day to the end of the feed_end_date day. Both days may be left empty if unavailable. The feed_end_date date must not precede the feed_start_date date if both are given. It is recommended that dataset providers give schedule data outside this period to advise of likely future service, but dataset consumers should treat it mindful of its non-authoritative status. If feed_start_date or feed_end_date extend beyond the active calendar dates defined in calendar.txt and calendar_dates.txt, the dataset is making an explicit assertion that there is no service for dates within the feed_start_date or feed_end_date range but not included in the active calendar dates.

The only way to be sure service is being run on that date is to check the calendar.txt and calendar_dates.txt. In the one provided I saw it was explicitly not available.

keyingtang commented 10 months ago

Hello, thanks for your reply. As I said in the initial issue description, the GTFS dataset I want to use only has calendar_dates.txt. And here it is: calendar_dates.txt

The departure data and time I was trying to request is 2023-10-06 08:30, of which the date is shown in this file. And I have used gtfs-lite library to check and make sure that there are services running every hour on that day. But I always got the RuntimeWarning error when trying to use TravelTimeMatrixComputer function of r5py.

wklumpen commented 10 months ago

The RuntimeWarning is a known bug with the date checking - I have encountered warnings but still generated results - it can be ignored if you're confident the data covers the date you're checking.

What's more concerning is the null results. There are a few possibilities:

The last one would be an upstream R5 issue (and is kinda concerning!). One thing you could do is manually create a calendar.txt file that includes all the service IDs for that date and covers the day of the week you need. If that runs and produces results, then we know the problem is likely upstream in R5, unless @christophfink does any GTFS processing before load.

christophfink commented 10 months ago

unless @christophfink does any GTFS processing before load

No, we don’t process the GTFS files before passing them to R5

That said, I still think this is a data issue. @keyingtang , could you share the actual GTFS data set you are using? If you want to share it confidentially, please send it to christoph.fink@helsinki.fi

sruinaard commented 10 months ago

Hi everybody,

I wanted to join in on the conversation as I have been having the same issue for Sweden. In the regional and national GTFS files, we do have a calendar.txt file and I compared it to the Helsinki and Sao Paulo demo. What stands out to me most is that for Sweden, we do not have 1's for the weekdays, they are all 0. I do not know whether this causes the problem for me? I included a screenshot of the three files.

I tried changing some 0's to 1's, but that gave some errors, as I think I probably need to take a more systematic approach to improve my GTFS file.

This is the warning I get: RuntimeWarning: Departure time 2023-10-16 00:00:00 is outside of the time range covered by currently loaded GTFS data sets.

I also get this runtime error when tying other dates in the GTFS file, or changing the departure time (window). I'll send you the GTFS file I have been using, as it is too big to upload it here. Thank you in advance for taking a look at it!

Screenshot 2023-10-24 at 11 41 15
wklumpen commented 10 months ago

Okay so for both @keyingtang and @sruinaard I was able to confirm with GTFS-lite that there are trips which run on the days specified.

For @sruinaard - Are you getting results when you run or an empty matrix? If so, you can (I think) safely ignore the error as it appears there is service during the period specified.

More diagnoses is needed on the other input data to understand why there might be blank matrices. My experience in the past has been one of projection or corrupted input files.

wklumpen commented 10 months ago

What stands out to me most is that for Sweden, we do not have 1's for the weekdays, they are all 0. I do not know whether this causes the problem for me? I included a screenshot of the three files.

It looks like the Sweden files have service_ids listed in calendar.txt that aren't used, while calendar_dates.txt lists individual services, so it's likely the calendar_dates.txt file that's being relied on.

I also noted that the files provided by both @keyingtang and @sruinaard are nested (folder-in-folder style with a MACOSX folder also). Can you try running the same analysis but just zipping the files directly? I would assume R5 would handle/throw an error while loading but I'm working on eliminating as many possible errors.

wklumpen commented 10 months ago

Another possible source of the issue - see this R5 issue .

Could the base version of R5 we use be causing this issue @christophfink?

christophfink commented 10 months ago

We definitely use a version after the linked issue's fix was merged. That does not mean, however, that we're not affected by something similar

I'll take a closer look at our date checking code, maybe there's a convenient way to test whether services exist on the particular day requested, rather than whether the requested date is within the covered period of the GTFS data set

Not sure whether we open a(nother) hole in our logic: if there is a GTFS schedule for a service that runs, say, Mon-Sat and is valid for an extended period of time, should requesting a route on a Sunday trigger the warning?

sruinaard commented 10 months ago

Thanks for the discussion, I will try running it without the nested structure and let you know how it goes.

@wklumpen I'm getting results - the exact same results as walking (descriptives and maps), while I do expect some impact of bus lines. Therefore, I will check next week what is exactly included in the GTFS data for my area and time period, to look more exactly where I'd expect different results.

I tried to improve my gtfs files with the r scripts I sent over, and then I did not get the warning anymore, but still, my transit and walking results seemed exactly the same (descriptives).

I will also run everything for another area, which is more densely populated and therefore has more public transport, to see if still transit is the same as walking. Keep you posted!

wklumpen commented 10 months ago

Not sure whether we open a(nother) hole in our logic: if there is a GTFS schedule for a service that runs, say, Mon-Sat and is valid for an extended period of time, should requesting a route on a Sunday trigger the warning?

I think we leave that kind of testing to other packages and remove the date check warning entirely. With that said, the GTFS-Lite 'date_trips()' function does take calendar and calendar date files into account for pulling all trips running on a given date.

We could use that but it does require loading the zip which slows the process down for large files.

I think if we've checked these particular feeds we should be able to safely say this shouldn't be a GTFS issue, but of course one is never sure.

keyingtang commented 10 months ago

Thanks for the discussion!

I also noted that the files provided by both @keyingtang and @sruinaard are nested (folder-in-folder style with a MACOSX folder also). Can you try running the same analysis but just zipping the files directly? I would assume R5 would handle/throw an error while loading but I'm working on eliminating as many possible errors.

Regarding this, I have tried just zipping the files directly and running the same analysis, and I got the same error and 'NaN' values in travel_time.

More diagnoses is needed on the other input data to understand why there might be blank matrices. My experience in the past has been one of projection or corrupted input files.

About this, I checked the projection of my input origins and destinations datasets, CRS of them is EPSG:28992 (Amersfoort / RD New), which seems to be an uncommon CRS. Could this be a possible reason?

wklumpen commented 10 months ago

Hmm. We do try to reproject to 4326 I think (@christophfink) but it's worth a try reprojecting it yourself first and then testing it again

christophfink commented 10 months ago

Yes, I can confirm that r5py transparently reprojects input data and then transforms the results back into the input CRS

sruinaard commented 9 months ago

Hi all,

We found the reason why the travel time matrices were the same: as we're interested in a rural area, and the public transport service level is low, the median value of travel times almost always returned a result that was the same as walking. We will proceed with using the 5th percentile and a departure time window of 1 hour to capture hourly services. The GTFS files are working for us. Thank you for your time checking things for us!

christophfink commented 9 months ago

@sruinaard thanks for the feedback. Indeed, the way R5 summarises trips over the departure time window is not immediately intuitive. At our lab, we reverted to using the first percentile in a 1h time window for most of our analyses, as we assume - especially in more rural study cases - that people can adapt their everyday mobility demands within these margins (which of course is not necessarily true, but we feel is a valid assumption for certain research)