google / transit

https://gtfs.org/
Apache License 2.0
590 stars 182 forks source link

GeoJSON in GTFS? (Or the future of GTFS serialisation) #391

Open skinkie opened 1 year ago

skinkie commented 1 year ago

It is currently being proposed in GTFS-Flex (#388) to introduce a new serialisation format we have never worked with in GTFS, or GTFS-RT. In the past these kind of discussions (think zeromq, mqtt, websockets...) were explicitly avoided in favor of proven technology. With the suggestion for a new serialisation format for GTFS-Flex namely: GeoJSON this feels to me as "too fast" especially because we historically have had this discussion about shapes.txt. But also more recently how to serialise more complex structures where CSV (honestly) also does not make sense either.

No, I am not proposing a full overhaul of all the files (or suggesting a well thought out format as NeTEx ;). But I think we can all agree if we bring in anything else than CSV it better be the right solution and apply it to multiple places, not just for GTFS-Flex.

I already have asked @isabelle-dr if we could have some sort of meeting on this topic.

e-lo commented 1 year ago

Thanks for bringing this up – I think this is a good discussion to have on its own and separate from (but with dependency on) GTFS-Flex. Managing and using another format does indeed bring in a challenge that should be carefully considered.

Some thoughts on introducing geojson as formulated in proposed locations.geojson:

I do think shapes would likely benefit from having a natively viewable format as well. I'd be curious about the previous discussion points and why this wasn't pursued in the end.

westontrillium commented 1 year ago

I think this discussion needs some historical context as to why the switch was made from expressing polygons using WKT strings in a .csv file to using GeoJSON, given by someone who was involved in the Flex (v2) drafting process.

eliasmbd commented 1 year ago

@westontrillium Is there anyone that you think can provide that historical context? Maybe someone we can invite to a live meeting?

westontrillium commented 1 year ago

Attn: @tsherlockcraig

e-lo commented 1 year ago

The only convo I could find on WKT in the GTFS-Flex repo shows a great deal of support for it >> GeoJson. Will try and find where decision was made in other direction.

https://github.com/MobilityData/gtfs-flex/issues/5

...somebody with edit access to the gtfs-flex google doc might be able to search the comments and version history for more.

tsherlockcraig commented 1 year ago

I can provide limited context here, but I think it's most of the context needed here to explain that decision. I project managed applications that used both wkt and geojson, but am not a developer and can't speak directly to the technical limitations of either approach, but I can speak to the business reasons and reports I received from technical parties at the time.

I think all the above is important and it's why i came to support geojson in GTFS-flex. It's also why I, from a business and community angle, don't see anything wrong with us bringing in other file types besides .csvs, if they're the right technical tool for the job.

But I think @skinkie raises a different valid and important question that we should hear from technical parties on , and @eliasmbd 's call to the conversation at https://github.com/google/transit/issues/127 feels particularly relevant to me although it's way above my head. Even if this is right for business reasons, what are the technical implications for existing systems and the future of the technical options available to or required of the spec?

One question we should ask: are there other options we should seriously consider besides geojson and wkt in a csv? If there are no other serious contenders, that at least might simplify this discussion.

eliasmbd commented 1 year ago

One question we should ask: are there other options we should seriously consider besides geojson and wkt in a csv? If there are no other serious contenders, that at least might simplify this discussion.

@tsherlockcraig I second this. I think our primary course of action is to evaluate the viable options (Pros/Cons). What we do here will define the future possibilities of GTFS and I applaud @skinkie for bringing this up when he did.

At this point we are working on a timeframe for a meeting. Before then, I would like to invite you all to share this issue to the relevant people in the community, new and old. It is important that everyone that should see this issue does before we engage in a virtual meeting. Internally, we are working on an appropriate stakeholder outreach as well.

leonardehrenfried commented 1 year ago

As someone that maintains OTP's Flex implementation I only have a one mundane comment on this topic: Flex is a significant departure from GTFS static. It's difficult to implement but that difficulty doesn't stem from it being CSV, wkt or geojson - that's the easy part. It's the huge variability and the explosion of possible results that flex adds that is the real complexity not the choice of geographic representation.

That said I also welcome there to be a discussion so that whatever decision we end up making is a deliberate one rather one that everybody thought someone already made.

leonardehrenfried commented 1 year ago

Allow me to respond to the side comment about Netex: what I like about GTFS is that it's hugely pragmatic rather than a giant standard where every country has their own "profile" because anything else is too large to manage.

So I take a well done GTFS feed over a "more elegant" but terribly implemented Netex feed any day. It's much, much easier to achieve "well done" with GTFS . The majority of producers struggle with GTFS, so what hope is there of them producing good Netex ones?

skinkie commented 1 year ago

The majority of producers struggle with GTFS, so what hope is there of them producing good Netex ones?

A proper free desktop implementation that manages data as a producer and uses NeTEx as its internal model, not conversion on conversion ;-)

eliasmbd commented 1 year ago

✏️ We have a few dates we would like to propose for the virtual meeting. Please fill out this form to find a good time for everyone.

Fill out form

eliasmbd commented 1 year ago

As we prepare for the meeting, we have asked you to share your expectations with us. In order to help us scope this meeting, we will post your expectations here.

Preferably, the vision 'beyond' GeoJSON. What should be done when we change significant parts of the standard. For example CSV to XML, CVS to JSON, Protocol Buffers to CBOR.

I think we should leave this meeting having answered whether there are options other than geojson and wkt that need to be researched. At very least, we identify a qualified group to make that determination and begin research. Important that we have technical stakeholders in this meeting. Needs of business stakeholders (like myself) should be de-prioritized in my opinion.

hopefully a focused discussion, not a general formats flame war as usual on the internet 😬

As you can see, expectations are diverse. From my end, I would propose to maintain the focus on GeoJSON and the alternate solutions out there. I kept the expectations anonymous but invite you all to participate helping us keep the scope focused and precise.

Also, it can be expected that we will host the meeting on Tuesday 8 August at 11am EDT. More details will follow.

eliasmbd commented 1 year ago

📣 We have an event registration page. Please sign up and share to all relevant parties. As expected the meeting will be held on Tuesday August 8th @ 11am EDT. (Sharpen your Miro pencils 😉 )

bdferris-v2 commented 1 year ago

Since @eliasmbd has prompted us to give our thoughts on this before the session tomorrow, here are mine:

The quick version is that I'm not immediately opposed to adding GeoJSON to the GTFS spec.

I ultimately come back to the GTFS Guiding Principles, which is make GTFS easy to produce and edit. My intuition here is that there will be many data producers out there who are managing their geographic assets, including service region polygons for Flex, in standard GIS applications. And for many of the most popular GIS applications, GeoJSON is a well-supported export format that could just work off-the-shelf.

I think a similar case can be made for CSV + WKT, but I think the tooling isn't quite as seamless.

Why didn't GTFS consider GeoJSON for something likes shapes.txt originally? If I understand my GeoJSON history correctly, GeoJSON has only really been a thing since 2007 and only an RFC since 2016 (GTFS being born in 2006). Might history have been different if GTFS has come slightly later? I do not know.

What other data formats might we consider? Conceivably, you might look at anything on the GDAL-supported Vector format list but I think there are only a handful of formats that are simple enough, have reasonable governance, and have been around long enough for consideration. I don't think it's an accident that GeoJSON is at the top of that list.

I recognize that producing GTFS (and GTFS-Flex especially) has gotten complex enough that it may not be reasonable to support the simple use-case of a transit operator typing up data in a spreadsheet and we may have expectations that some sort of GTFS export application will be in use, in which case some of these arguments around facility of creation carry less weight. That said, I do think there is something to be said for being able to quickly visualize data in a feed and GeoJSON does have some advantages there.

Anyways, looking to hear from other folks tomorrow. Thanks!

eliasmbd commented 1 year ago

🙏 Thank you for joining us for the strategic meeting held yesterday. It was an eye opening and refreshing discussion for many of us.

📝 Here are some takeaways from the meeting:

🗓️ Once the points above have been resolved, MobilityData will announce a follow up meeting - expect the meeting to be held sometime in September.

eliasmbd commented 1 year ago

:mega: MobilityData would like to invite you to review and comment our findings on the inclusion of GeoJSON within GTFS.

We have included the stakeholder outreach findings, the comparative analysis between GeoJSON and GPKG, as well as 2 options to consider and a suggestion.

:eyes: TL;DR

Here is the document link

:exclamation: MobilityData will consider the volume and quality of comments, revise the documentation if necessary and then call a meeting in the subsequent week (27 september 2023 @ 11AM EDT if consensus is maintained)

haydens30 commented 1 year ago

Folks, I have left my comment in the document - but you need to be aware of existing work and the planned roadmap of the OGC (Open Geospatial Consortium). especially the Special Working Group on Routing https://github.com/opengeospatial/ogcapi-routes/issues/58

drewda commented 1 year ago

The @interline-io team agrees with the "tldr" bullet points posted by @eliasmbd and with the overall substance of the document.

In case it's useful to others, here are the detailed comments we shared earlier in support:

eliasmbd commented 1 year ago

📯 We have a date for our next GeoJSON in GTFS meeting! - September 27th 2023 @ 11AM EDT 📯

Sign up for the event here

:pray: Please review and leave a comment in this document before attending this meeting. This meeting will cover the points highlighted in the document, confirm consensus around the option and propose a path forward for GeoJSON in GTFS.

📔 Please let us know if you would like to propose and present an alternative during the meeting, we can reserve a few minutes for you.

Disclaimer: This is not a GTFS-Flex working group meeting

mgilligan commented 1 year ago

While developing GTFS in 2005/2006, we discussed the usage of an ESRI shapefile for pattern geometries. At the time, it was perceived to be the most widely used GIS format. Ultimately, the decision was made to stick with CSV and sequence numbers to allow for easier adoption by others without a GIS.

I would prefer that shapes.txt and stops.txt are left untouched and not deprecated in any way. As a producer, I would hate to produce 2 versions of shapes and stops in our GTFS to make sure I don't break any of our consumer applications. If there is a need for GeoJSON versions of shapes.txt or stops.txt, open source tools could be developed to convert from specific GIS formats to GTFS and vice versa. This would also allow GTFS producers to maintain in whatever format works for them without a backward-incompatible change to the spec.

I understand the need for something more sophisticated when dealing with multi-part polygons in GTFS-Flex but for points and lines, the argument seems weak and creates more work for everyone involved. I would vote for the tried-and-true OGC WKT format in CSVs for expressing polygons but I'd be a +0 for GeoJSON.

As @drewda said, if GeoJSON is adopted by GTFS-Flex, there needs to be specific documentation about what features are acceptable, what projections are allowed e.g. EPSG 4326, etc., to simplify consumer software.