mbta / gtfs-documentation

Specification, archive, and resource information for the MBTA's implementation of GTFS
Other
30 stars 3 forks source link

Duplicated ids found in: fare_products - The returned object is not a tidygtfs object #39

Open coding-to-music opened 1 year ago

coding-to-music commented 1 year ago

Hello, I was running this project:

https://github.com/coding-to-music/r-stringlines-nyc-mta-gtfs-train-visualization

And I saw an error using tidytransit with the MBTA GTFS feed, a new file fare_products.txt

https://github.com/mbta/gtfs-documentation/pull/34

produces this error when running the R program:

# This may be unrelated, not sure:

Error in UseMethod("group_by") :
  no applicable method for 'group_by' applied to an object of class "NULL"
In addition: Warning message:

# This is the actual error:

In gtfs_to_tidygtfs(g, files = files) :
  Duplicated ids found in: fare_products
The returned object is not a tidygtfs object, you can use as_tidygtfs() after fixing the issue.

To fix, back up the zip file so you have an original copy:

cp MBTA_GTFS.zip MBTA_GTFS_original.zip

Now remove the offending file from the zip file

zip -d MBTA_GTFS.zip fare_products.txt

Now the file can be used as normal

I was able to produce many stringlines, after the fare_products.txt was removed
https://github.com/coding-to-music/r-stringlines-nyc-mta-gtfs-train-visualization/tree/main/stringlines

rymarczy commented 1 year ago

This appears to be an error within the tidytransit package.

tidytransit believes the primary_key field for the fare_products.txt table is the fare_product_id column:

  # fare_products
  m$fare_products <- spec_setup_fields(
    c("fare_product_id", "fare_product_name", "amount",
      "currency"),
    c("req", "opt", "req", "req"),
    c("character", "character", "numeric", "numeric"), # TODO currency should be handled with integers
    "opt",
    "fare_product_id") ## primary_key ##

Per GTFS documentation, https://gtfs.org/schedule/reference/#fare_productstxt, the primary key is a combination of fare_product_id and fare_media_id:

fare_products.txt

File: Optional

Primary Key (fare_product_id, fare_media_id)

fare_media_id can also be NULL in the fare_products.txt table, so tidytransit would also have to handle that.

coding-to-music commented 1 year ago

It's an interesting question - MBTA is not responsible for Tidytransit - and Tidytransit is trying to be compatable with all the transit systems in the world - not sure if the files are expected to be automatically importable - Tidytransit is not able to read documentation, the files are expected to be self-importable. Having a unique index sequence id column could solve the problem. Otherwise people are not going to be able to use Tidytransit for MBTA and will spin for hours/days trying to figure out the problem. But technically it's not MBTA's problem if a third party can't read the files... It is interesting that all the many other files in the MBTA.zip are able to be read. Anyway, just fyi about this issue. Thx