mfdz / GTFS-Issues

Documentation and Tracking of Issues in GTFS- and GTFS-RT Feeds
36 stars 3 forks source link

SEPTA: ZIP file contains multiple ZIP archives within #154

Open mlundblad opened 2 months ago

mlundblad commented 2 months ago

Issue description GTFS feeds (obtained via Transitland) for SEPTA (Southeast Pennsylvania Transportation Agency) contains two GTFS files within the ZIP file.

There are two links, one for a bus and one for a rail feed.

Last update of GTFS Feed 2024-09-07

Hash of the GTFS Feed SHA1: adb983d5fae46af17e07ae8ae31423b2a91b6916 SHA1: da7a6dc4e8f83f9b6dd4b1dc1e984b56a25c96b5

GTFS Feed Download Link https://github.com/septadev/GTFS/releases/latest/download/gtfs_public.zip#google_rail.zip https://github.com/septadev/GTFS/releases/latest/download/gtfs_public.zip#google_bus.zip

Corresponding Transitland pages: https://www.transit.land/feeds/f-dr4-septa~rail https://www.transit.land/feeds/f-dr4-septa~bus

mlundblad commented 2 months ago

Actually the "anchor part" (after the #) corresponds to the file name of the archive inside the "outer" ZIP. So maybe the intension is supposed to be that the parser treats that as an "address" into the ZIP…

hbruch commented 2 months ago

Hi @mlundblad!

Thanks for reporting this issue here. I was not aware that @septadev had already a GTFS GitHub repository they use to publish their feeds and to track issues people have with their feeds. That's great and significantly better than all the agencies I know.

I suggest to open an issue directly there as they surely will track their repo.

mlundblad commented 2 months ago

It seems this might be intended from SEPTA: https://github.com/septadev/GTFS/issues/14

In the meantime, I tested implementing support for treating "trailing path" after # in the URL as a "sub ZIP file" and extract the downloaded ZIP and extract and write down that "addressed" inner ZIP in:

https://github.com/public-transport/transitous/pull/518

mlundblad commented 2 months ago

Aha, and actually there seems to be directly links (not via the GitHub page).

So, maybe we should just use an HTTP source instead.