r-transit / gtfsio

Read and Write General Transit Feed Specification (GTFS)
https://r-transit.github.io/gtfsio/
Other
13 stars 3 forks source link

Allow locations.geojson #36

Open polettif opened 2 months ago

polettif commented 2 months ago

There have been some updates to the GTFS reference in March, one of which is the new locations.geojson file.

This clashes with the previous assumption that all files in a feed, even supplemental or custom files, should be comma delimited .txt files. Thus, gtfsio ignores the files and issues a warning.

url = "https://data.trilliumtransit.com/gtfs/dolorescounty-co-us/dolorescounty-co-us--flex-v2.zip"
g = gtfsio::import_gtfs(url)
#> Warning: Found non .txt files when attempting to read the GTFS feed: locations.geojson
#> These files have been ignored and were not imported to the GTFS object.

Now, I don't know what the best approach to handling these files is, as gtfsio does not handle spatial data. I'm open to just reading it as a text and let the other packages handle the conversion to a spatial format. We just shouldn't ignore them as a base line.

dhersz commented 1 month ago

Hi @polettif, sorry for the long time to answer.

I don't know much about the geojson format, but I'll have a look at how it handles spatial data.

I also don't have strong opinions on how to read it. Just like the official spec deviated from using WKT to represent spatial data (relevant discussion), we could also use {sf} to read locations.geojson, while still reading shapes and stops as plain tables to prevent backward incompatibility. Thoughts?

We should probably also start handling the other recently added files correctly. That would require the files to be added to get_gtfs_standards(), which shouldn't be too much of a problem either.

I'll probably have some time to look at it next week.

mpadge commented 1 month ago

@dhersz My thoughts are mainly that this will be difficult without introducing very heavy dependencies into this otherwise nicely lightweight package. My general suspicion would be that most users won't need or want locations.geojson, and of those that do, most will likely be able to handle the "raw" geojson data themselves anyway. So introducing a heavy dep just for a sub-case of a sub-case seems unnecessary.

There are three main ways that I currently use to handle geojson, in increasing complexity:

  1. The lightest way: jsonlite::read_json() (<250kB of libs), to just store as a simple list which can easily be converted later to some "spatial" format if desired, or spatial data easily extracted via lapply calls.
  2. Heavy way 1: geojsonio::geojson_read(), but geojsonio (~800Kb of libs) requires V8, which is a huge dependency (>1.5MB of libs) so no-go in my opinion.
  3. Heavy way 2: sf::st_read(), again no-go in my opinion because then the majority of the dependency footprint of this package will be eaten up by sf (~6MB not including system deps!) for a very minor use case.

For context, this libs of this package are around 200kB, so currently excellently small (with data.table obviously the largest dep, but unavoidable here.) My personal recommendation would be jsonlite, with sufficient explanation and docs to explain how to convert to other formats if desired.

dhersz commented 1 month ago

Great comment, thanks @mpadge!

So I suggest using {jsonlite} to handle the geojson and then adding the operation of converting the list to sf to the functions in {gtfstools} and {tidytransit} that convert plain GTFS objects to dt_gtfs and tidygtfs objects, respectively. Since both packages already import {sf}, this should not be a problem, and we preserve the light dependencies of {gtfsio}.