public-transport / friendly-public-transport-format

A format for APIs, libraries and datasets containing and working with public transport data.
Creative Commons Attribution Share Alike 4.0 International
125 stars 1 forks source link

allow non-url-safe ids #48

Closed juliuste closed 6 years ago

juliuste commented 6 years ago

I know that - in theory - it makes sense to incentivise people to use url-safe ids, however in some systems this is just not possible (like train-ose, where greek letters are used for station ids). Station ids could obviously be returned url-encoded but I feel like it is not FPTF's job to enforce this, especially when ids are also used in different methods of my module (e.g. journeys) and would need to be decoded again…

ialokim commented 6 years ago

I would also vote for allowing non-url-safe ids as it keeps things a lot simpler if the backend API returns ids with special characters.

derhuerst commented 6 years ago

Sorry that I come up with this, but I'm pretty hesitant to merge #60. I'm afraid that at some point, there will be bugs and respective workarounds because those IDs are not url-safe.

juliuste commented 6 years ago

I get your point but could you also respond to some of the arguments I gave in my inital comment?

My opinion for now, unless you give further info about why you're hesitant to merge: I feel like on one hand we definitely have cases where enforcing url-safeness alters the dataset and makes it harder to write modules while on the other hand you're describing a broad fear that something could happen, and given just those arguments I would always prefer the way that makes it easier in actual usecases rather than in some situations that didn't even occur yet.

derhuerst commented 6 years ago

This is a slippery slope.

Let's say we agree on modern-file-system-safe (no special characters such as *, but Unicode chars and spaces); Someone will eventually come up with the request to allow * in IDs as well, because their specific datasets contains it and just allowing it would make it a lot easier for them.

On the other hand, there are very specific cases (as you mentioned) where having this restriction makes the Friendly Public Transport Format more unfriendly.

derhuerst commented 6 years ago

There are multiple ways to encode special chars, such as Punycoding or URL-escaping. Especially URL-escaping seems pretty low-barrier, so I guess I'm fine with allowing it (people would have to manually URL-escape if they want to use FPTF IDs in URLs/file names).

juliuste commented 6 years ago

Since using techniques like punycoding or url-escaping disproportionately affects non-latin writing systems in a negative way, I have major concerns to enforce this within FPTF.

To give one example: The greek railways use Θεσσαλονίκη (Thessaloniki in greek) as an id for the local station. Punycoding would give you wwa0cfpmrfqr1ba as an id instead, and url-escaping gives %CE%98%CE%B5%CF%83%CF%83%CE%B1%CE%BB%CE%BF%CE%BD%CE%AF%CE%BA%CE%B7, while e.g. Berlin would always stay the same using all methods. Both my korail and train-ose package are affected heavily by this.

I know that IDs don't need to be human-readable, sometimes they maybe even shouldn't be, but IMHO this really doesn't fit the “friendliness“ of a format that our name indicates.

juliuste commented 6 years ago

I agree with you that this could be an issue, I just feel like we should not enforce this directly in FPTF. But if one day we would create - for example - something called fpti-rest (friendly public transport interface for REST, analogous to e.g. fpti-js for JS), we should enforce things like url-safeness there.

juliuste commented 6 years ago

@derhuerst thanks for merging #60, I really appreciate it!

derhuerst commented 6 years ago

FYI You don't have to URL-escape Unicode chars anymore.

juliuste commented 6 years ago

Oh, I didn't know that, nice!