google / transit

https://gtfs.org/
Apache License 2.0
557 stars 173 forks source link

Refinement of GTFS Terminology: Transitioning from "Schedule" to "Static" #442

Open eliasmbd opened 3 months ago

eliasmbd commented 3 months ago

TL;DR: we propose to officially harmonize the name GTFS-Static

Issue Description:

There has been a longstanding debate surrounding its naming conventions, particularly concerning the term "Schedule." As the GTFS specification has evolved, it has become evident that the term "Schedule" may no longer comprehensively encapsulate the diverse range of functionalities available today. Moreover, the interchangeability of terms like "Schedule" and "Static" has led to confusion among users.

Background:

Community Insight:

Is “GTFS Schedule” an official name? The GTFS real time spec refers it as “(CSV) GTFS” which is probably worst. I would prefer something like “GTFS-Static” because represent better what the static GTFS is (more than just schedule).

Proposal:

Rationale:

Action Items:

Benefits:

doconnoronca commented 3 months ago

I like this. I felt uncomfortable calling it GTFS Schedule because the S in GTFS also stands for schedule.

eliasmbd commented 3 months ago

@doconnoronca Glad you like it. Actually, GTFS stands for General Transit Feed Specification . I know there are many variants out there floating around. I.e. some still interpret the G for Google. We hope to make all of this a bit more clear. This is one step towards better comms.

abyrd commented 3 months ago

I think there might be some confusion here from spoken vs. written usage of a suffix.

My sense is that the term "GTFS Schedule" is used informally as a noun, outside of a strict standardization context, to reference a "transit schedule", meaning something vague like "information about when transit service runs". This is roughly equivalent to "horaires des transports" in French.

When people are distinguishing between the two main types of data (long-term zipped CSV tables versus short-term protobuf messages) the term I would expect to see is "scheduled" with an "-ed" suffix. "Schedule" in "GTFS schedule" is a noun, while "scheduled" in "scheduled GTFS" is a past participle adjective, roughly equivalent to "prévu" or "programmé" in French. Referring to the information in a GTFS CSV file as "scheduled" means this information was planned and announced to the public ahead of operations. This contrasts with "realtime", which covers non-scheduled aspects of service improvised during operations.

I think this meaning could be made clear with a single sentence of explanation, but I can also see why some people might dislike this term or it might cause confusion for some readers.

However, if you're looking to converge on a single term, "static" may not be a good replacement. I would usually use the expression "scheduled GTFS" when I anticipate the term "static" will be misunderstood or not understood at all, or will sound jarringly out of place in context. In English, "static" is one of those words of Greek and Latin origin that belongs to a very distinct scientific/intellectual register, at least in its original meaning of "fixed, unchanging, stationary".

If you regularly speak a Latin language, "static" probably naturally sounds like it would mean "immobile" or "stationary". But many native English speakers rarely or never use it this way. Imagine someone said "he was traversing the space". It's not wrong, but it sounds like a discussion of physics, astronomy, or art, and may for this reason be disorienting or incomprehensible. In American English, I think the only thing commonly referred to as "static" is static electricity, or random noise in visual or audio information assumed to be caused be discharges of static electricity.

Considering alternative terms for "scheduled", I'd put "static" in a close second position after "theoretical" for engineer-speak. Both terms are clear and straightforward for some people to interpret but will seem unnecessarily pretentious or obscure to others. I've even seen English-speaking people with an engineering background laugh at the expression "theoretical transit data", while this is absolutely a standard term in some other languages.

In common English usage, the counterpart to "real-time" may very well be "scheduled".

westontrillium commented 3 months ago

@abyrd I think the perceived confusion around the word "static" may be a bit overstated. I don't recall witnessing anyone, including very non-technical clients, have trouble intuitively understanding what static means in this context. It just doesn't strike me as that obscure of a word.

"Schedule" always felt a bit off the mark to me, and I know it took a while for others to get used to calling it that; some never got on board. For me, it's better to codify the language that is actually used rather than try to contrive the "best" word (descriptive over prescriptive, if we really need to nerd out on linguistics 🙂). "Static" sees widespread usage despite it not being the "official" suffix, it fits the description just fine, we should use "Static."

abyrd commented 2 months ago

Sure, many people including us have no problem understanding "static". I was just commenting with an intent to make the guide and terminology approachable and widely understandable. My sense is that most people (including me) if first encountering GTFS would not correctly understand either term in the linked page starting with "GTFS Schedule Overview... A GTFS feed, which contains static transit information..." I may very well be wrong. People reading for the first time about a data format may spontaneously understand what is meant by "static transit information". If there is sufficient evidence that across the whole audience for the GTFS spec and guidelines at gtfs.org that "static" is readily interpretable as "not realtime" then go for it. Please do consider though whether it's effective for people to encounter these terms on the first page they read with no explanations.

I want to clarify that I am not in any way advocating for "GTFS Schedule". I can't even say it felt off the mark for me because I simply don't recall ever encountering "GTFS Schedule" as the counterpart to "GTFS Realtime" anywhere outside the linked page on gtfs.org, and its use here seems confusing and grammatically odd. I just thought it might be a simple misunderstanding based on the expression "scheduled GTFS" which I do hear.

In short I was just trying to share some observations that might help with the decision:

e-lo commented 2 months ago

I'm in support of the change since:

  1. most of the people I work with already call it GTFS Static (several rungs up the ladder)
  2. many already assume that GTFS Schedule is the subset of GTFS Static that refers to the schedule (stop times, trips, frequencies, calendar) as opposed to, say, fixed infrastructure (stops, transfer times), business organizations (agencies), technical information (feed_info), and fares.
  3. The merging of GTFS flex makes calling it GTFS Schedule VERY confusing.

We originally harmonized around GTFS Schedule at the insistence of MobilityData at calling it that. I am in support of migrating away from "schedule" given all of the above.

Given the opportunity to ponder what term might be more descriptive and inclusive, I'm wondering if planned might be more appropriate than static since for Flex the operations aren't static. That said, I still prefer static over scheduleand planned has its own drawbacks.

stevenmwhite commented 2 months ago

Also in support of the change to GTFS Static.

In practice I'm used to using/hearing "GTFS Static" as a way to reference the type of GTFS (distinct from GTFS Realtime) "GTFS schedule" as a general term that describes the format of the schedule such as "have you imported your GTFS schedule into the system yet?" (this is distinct from Excel-based schedule import files and other formats that we support).

westontrillium commented 2 months ago

I think it's also worth noting that were we to decide that base GTFS should have an official "suffix" other than "static," I expect people will still use the word "static" when distinguishing between static and realtime GTFS data in regular conversation because that's the right word to use. Perhaps it's better to codify "static" (or whatever) as the right word to use when referencing this kind of data rather than as part of GTFS's official name. Does GTFS really need an official suffix? "GTFS" and "GTFS Realtime" seem clear enough to me. 🤷‍♂️

@e-lo I actually think static still works for describing Flex. The operations of Flex aren't static, i.e. what the vehicles actually end up doing on a given day, however the information contained in a GTFS-Flex dataset is, as it only refers to the static business rules and service parameters.

eliasmbd commented 2 months ago

@westontrillium Over the past few days, i've also been thinking about dropping the suffix. It depends how we want to communicate it. I see multiple options but let's assume we are okay with static.

  1. Option 1: Use GTFS as the general name. It includes GTFS Static and GTFS Realtime as subsets.
  2. Option 2: Just use GTFS and GTFS Realtime. We drop the static/schedule and just have 2 types.

Regardless, it think usage of the most commonly used terms would be easiest to implement. Changing behaviour is difficult. I appreciate the debate over linguistics and etymology (my kind of debate). Running a quick search on the usage of Static, it seems like the primary definition being used in current literature seems to lead towards the immobility / lack of movement of objects rather than the scientific term. Static is used to describe an unchanging market in business and is also used to describe the immobility of troops on the battlefield. There are many examples.

gcamp commented 2 months ago

Option 1 makes more sense to me. I think it's useful to have one word that represent both and without a specifier to clarify we end up with things like “(CSV) GTFS"

antrim commented 2 months ago

The "Static", "Schedule", and "CSV" terms arose mostly organically; people needed a way to differentiate types of feeds. I support Option 1, with some consistent term for GTFS Static/CSV/Schedule.

@e-lo What do you see as the drawbacks of "planned"? At first glance, this seems pretty descriptive. I guess a major drawback is that this term hasn't been widely used (as far as I know).

stevenmwhite commented 2 months ago

I also support Option 1, where "GTFS" is the general term and then something and "Realtime" describe the two types of GTFS.

I believe that "Static" is the most common term (anecdotally) and I tend to think that it's always easiest to adopt something that's already being used by the audience, so I'd suggest that we go with GTFS Static.

(Edit: And I realize I already recently commented basically the same thing... sorry, it's been a long couple weeks)

AdrianaCeric commented 2 months ago

I prefer GTFS Static—I always assumed this was standard nomenclature since I see this term referenced more often. First, it contrasts GTFS-rt and second, it makes more sense since GTFS-static encapsulates more than just a schedule nowadays (as discussed).

westontrillium commented 2 months ago

I'm happy with Option 1 as @eliasmbd has it written.

drewda commented 2 months ago

I haven't been that concerned with confusion about referring to static GTFS, so I've just been lurking on this thread.

But my colleagues and I have had some minor but still annoying challenges with the many ways of writing names for the real-time version over the years.

I ran a poll on Twitter or LinkedIn of how folks refer to the real-time spec a few years ago. Got results that I vaguely remember as:

Other included "GTFS-r" which was new to me, but I believe may be in more use outside of North America.

The sample probably wasn't that large and my memory is imprecise. But this does illustrate how inconsistent the spelling is for the real-time complement to static GTFS.

When Interline and CUTR wrote a report for TRB, we had to pick a standard spelling — that's why I tried running that little poll. In the report we wrote "GTFS Realtime" (Then again, there are also many pages on our Transitland site that mix together various spellings. So even within one firm we aren't consistent.)

The more I've thought about it over the years, the more I've though that GTFS Realtime may just not be a thing that needs to be named.

Instead there may be more utility in naming:

This focuses on the functionality, rather than the format of the serialization. The typical publishing frequency is implied by the functionality, rather than highlighted in the name.

Also instead of an agency telling you that "yes, we have GTFS Realtime!" and only sending you an endpoint with service alerts but not the other two RT message types, it can be more clear which of the RT messages an agency produces.

This could also make it easier to refer to spec additions that may touch both the CSV and PBF payloads. For example:

To fully deprecate the name "GTFS Realtime" would likely be too big of a change for all the agencies and vendors that are used to it. Still, in terms of organizing documentation, it may actually simplify terminology to not use static vs. real-time as an organizing scheme.

To summarize, my suggestion is a variant of @eliasmbd's Option 1: GTFS would be the overall name. But instead of static vs. real-time, the next level of the hierarchy would refer to functional components like Flex, Pathways, Trip Updates, Vehicle Positions, etc.

LeoFrachet commented 2 months ago

From my understanding, historically:

When I started to write documents for what will become MobilityData, there was no official name for the "original" GTFS. I needed a word to be specific about the not-realtime GTFS, I was un-at-ease to write sentences like "You need to keep your GTFS Static updated" and I used "GTFS Schedule". I'm likely not the one who invented it but I may be to blame for most of its usage.

My 2¢ today are as follow:

IMHO we have two main options:

In term of process, among others, a possible process could be:

We could add a Phase 0: Vote on whether we should have dedicated name for non-realtime-gtfs, if needed.

isabelle-dr commented 2 months ago

Echoing what was said about the confusion of having something called GTFS Schedule modeling Fares and certain DRT services.

I just attended a conference in Europe, the terms Static and Dynamic are widely used, and they are the terms used in regulation (source). I got a few eyebrows raised when explaining that Fares can be modeled using GTFS Schedule.

eliasmbd commented 1 month ago

I just attended a conference in Europe, the terms Static and Dynamic are widely used, and they are the terms used in regulation (source).

Screenshot 2024-05-22 at 11 11 37 AM

Static vs Dynamic makes sense to me seeing this like that. They are natural antonyms.

From a user perspective, I believe Realtime is sufficiently understood not to warrant a change at this time. But, would not be opposed to consider changing both to "standardize" the logic as presented in EU regulation.

timMillet commented 1 month ago

At Transit, we selected the term GTFS Static for our data guidelines in our Resources for Transit Partners website. This choice was based on which term was the most used by any GTFS stakeholders we had communications with. I believe Google uses Static too. “Static” seems fine as it is already widely used by consumers and producers

eliasmbd commented 1 month ago

I propose to move this issue to a PR.

To keep it within the scope of the issue, and considering what has been discussed, I propose we change GTFS Schedule to GTFS Static in the documentation.

To avoid an expansion of the scope, I suggest we put the naming discussion for GTFS Realtime in a backlog. It seems like it's use is still quite common and is not as problematic as Schedule vs. Static.