mfdz / GTFS-Issues

Documentation and Tracking of Issues in GTFS- and GTFS-RT Feeds
35 stars 3 forks source link

DELFI (and others): Use wikidata entity id as agency_id #143

Open hbruch opened 3 months ago

hbruch commented 3 months ago

Current issue(s) For all agencies in the DELFI GTFS feed, agency_url is set to https://www.delfi.de, for most of them further information (besides the name) is missing. Regarding the agencies IDs, it's unclear who maintains them and if they are stable accross different feed versions.

Enhancement/addition I'd like to suggest As all of this information should be publicly available, and many agencies are already present in wikidata, I suggest to use wikidata entity IDs as identifiers, by which further information can be linked to agencies and unique IDs across GTFS feeds would automatically achieved.

This also would be a step forward to promote linked open data in the transit domain.

Downsides Currently, the DELFI feed often uses kind of "Dummy agencies" (see https://github.com/mfdz/GTFS-Issues/issues/107) which would not exist in wikidata. Personally, I consider this bad practice and the recommendation to use wikidata entity IDs could underline that real agencies should be specified. As long as this is not the case, agency_ids not refering to an existing wikidata-entity should at least not use wikidata entity id format, i.e. they shoul not start with a Q followed by numbers.

Last update of GTFS Feed 2024-04-02

GTFS Feed Download Link Open-Data ÖPNV

hbruch commented 3 months ago

To start collecting the entity identifiers and match them with the current agency_id, I started this DELFI GTFS Agencies Google Sheet. Feel free to create missing agencies in wikidata.

BeckertAnke commented 3 months ago

Your suggestion is an interesting approach. It will be included in our internal discussion about adapting agency.txt. Using the example of the associations in Baden-Württemberg, I would like to point out the following restriction: the company Friedrich Müller Omnibus operates on both the VVS and the HNV and each has its own internal ID for Friedrich Müller Omibus. It will be difficult to merge these two IDs in our data collector and reference them to the wikidata ID.

Best Regards BeckertAnke (DELFI-Team)

hbruch commented 3 months ago

Thanks for considering it in the further discussion. I guess this ID merge restriction is also the reason for current _G or _D suffixes in stops.txt oder routes.txt? If that's the case, I'd think that a general solution for merging equivalent entities provided by different agencies needs to found. If the collector itself can't merge them, a post-processing might be required(?)

BeckertAnke commented 3 months ago

You're right. The _G suffixes are added to the GTFS feed due to data merging. We also do not want _D suffixes. We follow this up with the data suppliers and ask them to provide us with correct data sets.

Merging agencies is hard work ;-)