Closed derhuerst closed 3 years ago
Currently, at least in the German & European area, we have several open source projects that already specify API endpoints in a somewhat generalised way:
kpublictransport
src/lib/networks
public-transport-enabler
list of classes & @alexander-albers TripKit
list of classesmarudor.de
HAFAS endpoint listhafas-client
's list of "profiles" and, building on top, pan-european-public-transport
's list of endpoints; @em0lar's pyhafas
list of profilestransit.land
's list of GTFS-RT feedsopenmobilitydata.org
's list of feeds & corresponding APISimilar resources:
(edited to include the projects mentioned in https://github.com/public-transport/transport-apis/pull/1#issuecomment-743757381)
Regarding the KPublicTransport format:
For geocoordinates, I propose:
reliableArea: [[lon, lat], [lon, lat], ...]
is the polygon in which the service returns data with the maximum known amount of detail and accuracy. It should be set for each entry.usableArea: [[lon, lat], [lon, lat], ...]
is the polygon in which the service returns any kind of useful data. In case of HAFAS: For locations contained in usableArea, but not contained in reliableArea, data such as line numbers or train attributes may be missing, but core functionality (e.g. routing with real-time data) remains available. usableArea is optional; if unset, it is assumed to be identical with reliableArea.Additional projects with similar setups would be:
reliableArea
[...] with the maximum known amount of detail and accuracyusableArea
[...] any kind of useful data. [...] such as line numbers or train attributes may be missing, but core functionality (e.g. routing with real-time data) remains available.
Your definition of "usable" is what I'd consider to be "reliable". 😬 The phrasing aside, I'd say there are several nuances/levels of data coverage:
Of course, we could make this distinction arbitrarily precise, which wouldn't help all of these projects.
Attribution information would probably be also a good idea for proper Open Data backends, even if those are still rare. Example: https://invent.kde.org/libraries/kpublictransport/-/blob/master/src/lib/networks/no_entur.json#L38
There are backends with more than one endpoints. For instance, most XML EFA backends provide both XSLT_DM_REQUEST
(departure monitor) and XSLT_TRIP_REQUEST2
(routing). Similarly, HAFAS installations don't just have mgate.exe (with "crypto"), but also less capable, easier to use endpoints such as ajax-getstop.exe, trainsearch.exe or stboard.exe/bhftafel.exe.
As different andpoints have different requirements and configuration variables, we shouldn't just have one JSON file per endpoint, but also one type definition. E.g. efa_dmrequest
, efa_triprequest
, hafas_mgate
and hafas_stationboard
.
Attribution information would probably be also a good idea for proper Open Data [...].
Do you think it makes sense to use the datapackage.json
spec (or just the field names) or some linked open data vocabulary for that? It is somewhat specific to files/blobs of data (vs. API endpoints), but we wouldn't add yet another ad-hoc standard to the ecosystem.
As different andpoints have different requirements and configuration variables, we shouldn't just have one JSON file per endpoint, but also one type definition. E.g.
efa_dmrequest
,efa_triprequest
,hafas_mgate
andhafas_stationboard
.
I'm not sure if such an "enum of types of APIs" will scale well. As an example, if you consider HAFAS endpoints, there are those with "crypto", without "crypto", rest.exe
APIs, stboard.exe
APIs, ajax-getstop.exe
APIs, extxml.exe
APIs, query.exe
APIs, and probably more that I don't know of.
There are backends with more than one endpoints. For instance, most XML EFA backends provide both
XSLT_DM_REQUEST
(departure monitor) andXSLT_TRIP_REQUEST2
(routing). Similarly, HAFAS installations don't just have mgate.exe (with "crypto"), but also less capable, easier to use endpoints such as ajax-getstop.exe, trainsearch.exe or stboard.exe/bhftafel.exe.As different andpoints have different requirements and configuration variables, we shouldn't just have one JSON file per endpoint, but also one type definition. E.g.
efa_dmrequest
,efa_triprequest
,hafas_mgate
andhafas_stationboard
.
For Hafas that's the two types we have implemented indeed, mgate.exe or the (old?) query.exe/ajax-getstop.exe/stbboard.exe variant, modeled as different types as they both need different requests and different result parsing. We currently have only one endpoint for the latter (ie. query.exe/ajax-getstop.exe/stbboard.exe combined, not each of them individually) - example: https://invent.kde.org/libraries/kpublictransport/-/blob/master/src/lib/networks/ch_sbb.json
For EFA we have 1.5 variants: only a single request path, but two separate parsers depending on whether the result is the full XML or the mobile/compact variant. Our current config files model this as one type, with different parameters. This is also how we implement the small variations in the request parameters. We could also handle that as different types though, the impact on our implementation would be quite small.
I'm not sure if such an "enum of types of APIs" will scale well. As an example, if you consider HAFAS endpoints, there are those with "crypto", without "crypto",
rest.exe
APIs,stboard.exe
APIs,ajax-getstop.exe
APIs,extxml.exe
APIs,query.exe
APIs, and probably more that I don't know of.
You're right. In fact, when it comes to the HAFAS query variant, some endpoints are mostly useless when viewed in isolation. For example, traininfo.exe is only usable with the trainLink obtained by using trainsearch.exe, so those should belong to the same JSON file.
I think it's time to start tinkering with JSON files (at least for me, having an example endpoint definition in a JSON file works much better than just reading a discussion thread). To this end, I have created two DB HAFAS definitions (one for mgate, one for query) and an EFA (VRR) definition. They're suggestions based on the discussion so far; feel free to edit them as you see fit.
For me, the following open questions remain:
"type": "hafas_mgate_deutschebahn"
special case nearly as-is. Feel free to change it it.For me, the following open questions remain:
* how should we perform localization? The kpublictransport definitions look sensible to me, but I don't have experience in that area, so I'll leave that decision to you.
For KPublicTransport this is connected to KDE's translation infrastructure, so they get translated automatically by just being there. No idea how we best handle that here.
* Personally, I'd like an endpoint repository to document both sophisticated and simple API variants (e.g. both hafas-mgate and hafas-query). As db-hafas-mgate and db-hafas-query have the same provider, client software should be able to decide by itself whether it prefers the mgate or query API, so we don't need to specify a preference or otherwise indicate that they're identical. What do you think?
Agreed. As long as there is a way to detect multiple endpoints for the same provider in client code I'd indeed let the client code decide on the priority. For single protocol clients this is simple anyway, multi-protocol clients should get good results by picking the better implemented or more powerful protocol first.
* I'm not familiar with the DB HAFAS mgate endpoint, so I left the `"type": "hafas_mgate_deutschebahn"` special case nearly as-is. Feel free to change it it.
I'd go with "hafas_mgate" here, the "deutschebahn" special case in KPublicTransport is for coach layout support, which is a bit out of scope here I guess.
The options {}
vs top-level keys split is another implementation detail of KPublicTransport worth reconsidering here, anything in options
is protocol-specific, anything top-level is handled by generic infrastructure there. For single-protocol clients that separation is completely arbitrary though, and even for multi-protocol clients that split might be different.
The
options {}
vs top-level keys split is another implementation detail of KPublicTransport worth reconsidering here, anything inoptions
is protocol-specific, anything top-level is handled by generic infrastructure there. For single-protocol clients that separation is completely arbitrary though, and even for multi-protocol clients that split might be different.
Having those entries that are unspecified by this spec in a nested object probably makes maintaining this spec easier, having them directly at the root level increases the usability. I don't really care about this though, I'd rather try in practice what we have.
LGTM for now!
+1, let's make this v1 and see how it turns out in practice.
I moved the documentation to the main readme file and specified the language codes (I presume we're going to use ISO 639-1), so we should be good to go.
@vkrause Please merge if you think this looks good.
Agreed, let's get this in, and continue in smaller/more focused PRs/issues to keep the discussion easier to follow.
My 2 cents about the format the we specify an API endpoint with:
As the specific parameters of different backends (e.g. OpenTripPlanner, Navitia, HAFAS, EFA) are quite specific, let's keep the specified JSON fields specific to the backend type (i.e. different fields for OTP than for HAFAS) and define them only roughly. I'm of course fine with general properties of the API, such as a description of the data contained or the provider, to be specified in a consistent way.
Personally, I like the format used by
kpublictransport
a lot, but I propose tofilter
field (coarse bounding box of the covered area), possibly even split it into "largest area this API is known to return any data for" and "area this API is known to return canonical/detailed/exact data for".lineModeMap
not to specify the semantics of the modes of transport used by the API, but rather just descriptions & localisations. Attempting to standardise modes of transport in a semantic way is very hard, and many others have tried and failed before.In general, I'd like us to rapidly iterate on this format. If something doesn't fit, let's open a PR to change it and make a new version!