ipeaGIT / gtfstools

General Transit Feed Specification (GTFS) Editing and Analysing Tools
https://ipeagit.github.io/gtfstools/
Other
40 stars 8 forks source link

`remove_duplicates()` #31

Closed dhersz closed 3 years ago

dhersz commented 3 years ago

Many GTFS files have duplicated entries on them (for example, spo_gtfs has an agency.txt with 2 identical rows and many other duplications spread throughout the tables).

prune_gtfs() will basically remove duplicate entries from the file. It can be used in conjunction with the filter_*** functions as well, calling it either in the beginning or the end of the filtering process.

dhersz commented 3 years ago

Actually I think it's worth implementing this now. A good way to implement crop_gtfs() is to create two datasets, one filtered by shape_id and another by stop_id, and then merging them. But we need a way of making sure that these don't have duplicated entries, and this is where `remove_duplicates()' come in.