tulsawebdevs / django-multi-gtfs

Django app to import and export General Transit Feed Specification (GTFS)
http://tulsawebdevs.org/
Apache License 2.0
50 stars 32 forks source link

Create GTFS difference report #17

Open jwhitlock opened 10 years ago

jwhitlock commented 10 years ago

Create a method for showing the common elements and differences between two GTFS feeds.

Difficultly - Hard Criteria - A text or HTML report would be a great starting place

One strategy: Create a hashing function for a 'row', use to find the identical elements in two feeds and the unique elements. For the unique elements, do a simple matching by GTFS IDs to identify records that changed from one to another. Generate a Markdown report.

bbrewington commented 8 years ago

Did you guys figure this out? Working on same project with Atlanta's GTFS file, to help the MARTA Army project

jwhitlock commented 8 years ago

No, never got to building this feature, and I probably won't get to it. I don't think @jdungan is the right person either.

My idea was to add a "signature" column. The basic idea was:

  1. Construct a JSON representation of the item, with a known key order, whitespace settings, etc. Omit things like the line pattern that depend on other columns.
  2. Take the md5sum of that representation, and store it in a new "signature" column in the database.

The idea is that, if two stops in two feeds have the same data, they will have the same signature column. This helps identify the data items that did not change.

Once you know what didn't change, you have what did change. The challenge is to match items that refer to the same thing, so you need a similarity algorithm. For example, if a stop has the same stop_id, that's a pretty good indication that it refers to the same stop, even if the latitude and longitude are different. Easy for people, potentially challenging to code.

Then, there is a sample UI to display things that are the same, things that changed, new items, and deleted items.

Very useful stuff, but hard to do, and maybe too much to ask a volunteer to do.