Open skinkie opened 1 year ago
Thanks for bringing this up – I think this is a good discussion to have on its own and separate from (but with dependency on) GTFS-Flex. Managing and using another format does indeed bring in a challenge that should be carefully considered.
Some thoughts on introducing geojson as formulated in proposed locations.geojson
:
locations.geojson
. locations.geojson
into a dataframe (or equiv) format. No nested properties.I do think shapes would likely benefit from having a natively viewable format as well. I'd be curious about the previous discussion points and why this wasn't pursued in the end.
I think this discussion needs some historical context as to why the switch was made from expressing polygons using WKT strings in a .csv file to using GeoJSON, given by someone who was involved in the Flex (v2) drafting process.
@westontrillium Is there anyone that you think can provide that historical context? Maybe someone we can invite to a live meeting?
Attn: @tsherlockcraig
The only convo I could find on WKT in the GTFS-Flex repo shows a great deal of support for it >> GeoJson. Will try and find where decision was made in other direction.
https://github.com/MobilityData/gtfs-flex/issues/5
...somebody with edit access to the gtfs-flex google doc might be able to search the comments and version history for more.
I can provide limited context here, but I think it's most of the context needed here to explain that decision. I project managed applications that used both wkt and geojson, but am not a developer and can't speak directly to the technical limitations of either approach, but I can speak to the business reasons and reports I received from technical parties at the time.
I think all the above is important and it's why i came to support geojson in GTFS-flex. It's also why I, from a business and community angle, don't see anything wrong with us bringing in other file types besides .csvs, if they're the right technical tool for the job.
But I think @skinkie raises a different valid and important question that we should hear from technical parties on , and @eliasmbd 's call to the conversation at https://github.com/google/transit/issues/127 feels particularly relevant to me although it's way above my head. Even if this is right for business reasons, what are the technical implications for existing systems and the future of the technical options available to or required of the spec?
One question we should ask: are there other options we should seriously consider besides geojson and wkt in a csv? If there are no other serious contenders, that at least might simplify this discussion.
One question we should ask: are there other options we should seriously consider besides geojson and wkt in a csv? If there are no other serious contenders, that at least might simplify this discussion.
@tsherlockcraig I second this. I think our primary course of action is to evaluate the viable options (Pros/Cons). What we do here will define the future possibilities of GTFS and I applaud @skinkie for bringing this up when he did.
At this point we are working on a timeframe for a meeting. Before then, I would like to invite you all to share this issue to the relevant people in the community, new and old. It is important that everyone that should see this issue does before we engage in a virtual meeting. Internally, we are working on an appropriate stakeholder outreach as well.
As someone that maintains OTP's Flex implementation I only have a one mundane comment on this topic: Flex is a significant departure from GTFS static. It's difficult to implement but that difficulty doesn't stem from it being CSV, wkt or geojson - that's the easy part. It's the huge variability and the explosion of possible results that flex adds that is the real complexity not the choice of geographic representation.
That said I also welcome there to be a discussion so that whatever decision we end up making is a deliberate one rather one that everybody thought someone already made.
Allow me to respond to the side comment about Netex: what I like about GTFS is that it's hugely pragmatic rather than a giant standard where every country has their own "profile" because anything else is too large to manage.
So I take a well done GTFS feed over a "more elegant" but terribly implemented Netex feed any day. It's much, much easier to achieve "well done" with GTFS . The majority of producers struggle with GTFS, so what hope is there of them producing good Netex ones?
The majority of producers struggle with GTFS, so what hope is there of them producing good Netex ones?
A proper free desktop implementation that manages data as a producer and uses NeTEx as its internal model, not conversion on conversion ;-)
✏️ We have a few dates we would like to propose for the virtual meeting. Please fill out this form to find a good time for everyone.
As we prepare for the meeting, we have asked you to share your expectations with us. In order to help us scope this meeting, we will post your expectations here.
Preferably, the vision 'beyond' GeoJSON. What should be done when we change significant parts of the standard. For example CSV to XML, CVS to JSON, Protocol Buffers to CBOR.
I think we should leave this meeting having answered whether there are options other than geojson and wkt that need to be researched. At very least, we identify a qualified group to make that determination and begin research. Important that we have technical stakeholders in this meeting. Needs of business stakeholders (like myself) should be de-prioritized in my opinion.
hopefully a focused discussion, not a general formats flame war as usual on the internet 😬
As you can see, expectations are diverse. From my end, I would propose to maintain the focus on GeoJSON and the alternate solutions out there. I kept the expectations anonymous but invite you all to participate helping us keep the scope focused and precise.
Also, it can be expected that we will host the meeting on Tuesday 8 August at 11am EDT. More details will follow.
📣 We have an event registration page. Please sign up and share to all relevant parties. As expected the meeting will be held on Tuesday August 8th @ 11am EDT. (Sharpen your Miro pencils 😉 )
Since @eliasmbd has prompted us to give our thoughts on this before the session tomorrow, here are mine:
The quick version is that I'm not immediately opposed to adding GeoJSON to the GTFS spec.
I ultimately come back to the GTFS Guiding Principles, which is make GTFS easy to produce and edit. My intuition here is that there will be many data producers out there who are managing their geographic assets, including service region polygons for Flex, in standard GIS applications. And for many of the most popular GIS applications, GeoJSON is a well-supported export format that could just work off-the-shelf.
I think a similar case can be made for CSV + WKT, but I think the tooling isn't quite as seamless.
Why didn't GTFS consider GeoJSON for something likes shapes.txt
originally? If I understand my GeoJSON history correctly, GeoJSON has only really been a thing since 2007 and only an RFC since 2016 (GTFS being born in 2006). Might history have been different if GTFS has come slightly later? I do not know.
What other data formats might we consider? Conceivably, you might look at anything on the GDAL-supported Vector format list but I think there are only a handful of formats that are simple enough, have reasonable governance, and have been around long enough for consideration. I don't think it's an accident that GeoJSON is at the top of that list.
I recognize that producing GTFS (and GTFS-Flex especially) has gotten complex enough that it may not be reasonable to support the simple use-case of a transit operator typing up data in a spreadsheet and we may have expectations that some sort of GTFS export application will be in use, in which case some of these arguments around facility of creation carry less weight. That said, I do think there is something to be said for being able to quickly visualize data in a feed and GeoJSON does have some advantages there.
Anyways, looking to hear from other folks tomorrow. Thanks!
🙏 Thank you for joining us for the strategic meeting held yesterday. It was an eye opening and refreshing discussion for many of us.
📝 Here are some takeaways from the meeting:
Most participant seemed interested in adding a new format to GTFS
We noticed a consensus was building around the specific geometries that the community wanted to target - zones and shapes.
Many participants showed support for the GeoJSON format but some voiced the options of using GPKG
[x] MobilityData will provide the community with a few options considering the implications of adopting a new format and recommendations.
🗓️ Once the points above have been resolved, MobilityData will announce a follow up meeting - expect the meeting to be held sometime in September.
:mega: MobilityData would like to invite you to review and comment our findings on the inclusion of GeoJSON within GTFS.
We have included the stakeholder outreach findings, the comparative analysis between GeoJSON and GPKG, as well as 2 options to consider and a suggestion.
:eyes: TL;DR
locations.geojson
(polygons) first and addressing shapes.geojson
(linestring/route shapes) afterward.:exclamation: MobilityData will consider the volume and quality of comments, revise the documentation if necessary and then call a meeting in the subsequent week (27 september 2023 @ 11AM EDT if consensus is maintained)
Folks, I have left my comment in the document - but you need to be aware of existing work and the planned roadmap of the OGC (Open Geospatial Consortium). especially the Special Working Group on Routing https://github.com/opengeospatial/ogcapi-routes/issues/58
The @interline-io team agrees with the "tldr" bullet points posted by @eliasmbd and with the overall substance of the document.
In case it's useful to others, here are the detailed comments we shared earlier in support:
We strongly agree that GeoJSON makes the most sense as the format for expressing any new vector geospatial data to be added to the GTFS spec. GeoJSON works with a wide range of tooling, as you know. It's expressed in text (unlike other recent options like GeoPackage).
GeoJSON does have some performance limitations that are a problem in other areas. (On Interline's website you'll find some blog posts about how we sometimes produce and consume "GeoJSONL" as an alternative, for example with a lot of OpenStreetMap data.) But individual GTFS feeds are rarely going to hit the limitations of GeoJSON. So we think GeoJSON is fine for use within individual GTFS feeds.
Interline has hosted trip planners running against the GTFS-Flex v1 and GTFS-Flex v2 specifications. We defer to our partners at Trillium to create the flex feeds, but have often had to debug issues in flex feeds when they aren't ingested properly or produce the expected trip plans. Flex can be hard to debug. The switch from WKT to GeoJSON for expressing geometries did make it easier to debug any issues that involve geometries. It's somewhat easier to open up a GeoJSON file than it is to read in WKT from a column in a CSV file. This is a reason why we'd support sticking with GeoJSON rather than reverting back to WKT for flex.
We do like the idea of switching from shapes.txt to a GeoJSON representation, but think this complicates the current question. From our perspective, it makes the most sense to move ahead with adopting GTFS-Flex v2 with a GeoJSON file. Doing more things with GeoJSON files in GTFS feeds would be nice and we would support those changes, but feels like it complicates the adoption of flex right now.
In our experience the audiences for modeling fixed-route transit in GTFS and for modeling demand-responsive transit in GTFS-Flex are almost completely different. The good news is that additions for flex probably won't complicate matters for traditional fixed-route transit agencies -- they can ignore the additions. The bad news is that the audience for flex/DRT has, on average, much less technical capability than fixed-route transit agencies. This isn't exactly an argument for GeoJSON, but we're just sharing this observation
When it's time to try adopting GeoJSON as an alternative for shapes.txt, we think this will be a net positive for all transit agencies. It's hard to think of a situation where editing points as rows in a spreadsheet would be easier or more accessible than editing polyline features in a GeoJSON file. It would be nice for this to be adopted alongside a formal approach to versioning of the spec -- still, we think that it could work to conditionally require shapes.txt or a "shapes.geojson" file. Just as with separating the adoption of flex from the adoption of shapes in GeoJSON, we overall assume it would be simplest for the GTFS community to make incremental steps, rather than have a number of steps all bundled together with blocking dependencies on each other.
Finally, the one challenge about using GeoJSON will be needing to carefully limit the type of features that can be used in each GeoJSON file, and also the properties that can be attached to each feature. There can be a number of different ways to express the same geometries in GeoJSON (for example as Features in a FeatureCollection or in a single MultiPoint) It'll probably be best to use a really simple and limited schema. It might also be useful to eventually end up with separate GeoJSON files -- like one for flex areas, and a complete separate GeoJSON file as the shapes.txt alternative for fixed-route alignments. The simpler approaches will make it easier to use basic editing tools like geojson.io We've already seen related conversations about the need to keep the schema tight and simple on GitHub, so we trust that this is already a known issue and will get figured out.
:pray: Please review and leave a comment in this document before attending this meeting. This meeting will cover the points highlighted in the document, confirm consensus around the option and propose a path forward for GeoJSON in GTFS.
📔 Please let us know if you would like to propose and present an alternative during the meeting, we can reserve a few minutes for you.
Disclaimer: This is not a GTFS-Flex working group meeting
While developing GTFS in 2005/2006, we discussed the usage of an ESRI shapefile for pattern geometries. At the time, it was perceived to be the most widely used GIS format. Ultimately, the decision was made to stick with CSV and sequence numbers to allow for easier adoption by others without a GIS.
I would prefer that shapes.txt and stops.txt are left untouched and not deprecated in any way. As a producer, I would hate to produce 2 versions of shapes and stops in our GTFS to make sure I don't break any of our consumer applications. If there is a need for GeoJSON versions of shapes.txt or stops.txt, open source tools could be developed to convert from specific GIS formats to GTFS and vice versa. This would also allow GTFS producers to maintain in whatever format works for them without a backward-incompatible change to the spec.
I understand the need for something more sophisticated when dealing with multi-part polygons in GTFS-Flex but for points and lines, the argument seems weak and creates more work for everyone involved. I would vote for the tried-and-true OGC WKT format in CSVs for expressing polygons but I'd be a +0 for GeoJSON.
As @drewda said, if GeoJSON is adopted by GTFS-Flex, there needs to be specific documentation about what features are acceptable, what projections are allowed e.g. EPSG 4326, etc., to simplify consumer software.
It is currently being proposed in GTFS-Flex (#388) to introduce a new serialisation format we have never worked with in GTFS, or GTFS-RT. In the past these kind of discussions (think zeromq, mqtt, websockets...) were explicitly avoided in favor of proven technology. With the suggestion for a new serialisation format for GTFS-Flex namely: GeoJSON this feels to me as "too fast" especially because we historically have had this discussion about shapes.txt. But also more recently how to serialise more complex structures where CSV (honestly) also does not make sense either.
No, I am not proposing a full overhaul of all the files (or suggesting a well thought out format as NeTEx ;). But I think we can all agree if we bring in anything else than CSV it better be the right solution and apply it to multiple places, not just for GTFS-Flex.
I already have asked @isabelle-dr if we could have some sort of meeting on this topic.