tulsawebdevs / django-multi-gtfs

Django app to import and export General Transit Feed Specification (GTFS)
http://tulsawebdevs.org/
Apache License 2.0
51 stars 33 forks source link

Link routes to default agency on import #43

Open alaw005 opened 9 years ago

alaw005 commented 9 years ago

The agency_id field is optional for feeds with only one agency. This means that routes are not directly associated within an agency on import as there appears to be no relationship .

Suggest that on import check to see if there is only one agency present. If there is only one agency then assign it a default agency_id (e.g. 1) and update the route records to reference this new agency.

I don't fully understand the import process but it would appear that a good option might be to update the "import_gtfs" function in the feed model (immediately before update geometries)? I attempted adding a a default value of 1 to agency_id and Agency and agency_id in Route but this didn't work.

jwhitlock commented 9 years ago

Check out the test_import_minimal test. If the agency_id column is omitted, then Agency.agency_id is a blank string.

It is true that there is no direct link from the route to the agency in this case. However, they are both part of the same feed, so you can derive the relation from that. If you need to make this link explicit, then you'll have to post-process the feed.

alaw005 commented 9 years ago

I suppose the current behavior is true to the GTFS so correct. I think an explicit link would be useful as an Agency object does exist for that feed, it just isn't referenced in the Route object.

Would it be sensible to make changes to this section of the feed import i.e. immediately before the geometries are updated? I'm only learning python/django but looking at your existing code would something like this work:

if self.agency_set.count() == 1:
    start_time = time.time()
    for route in self.route_set.all():
        route.agency_id = self.agency
    end_time = time.time()
    logger.info(
        "Assigned default agency for %d routes in %0.1f seconds",
        self.route_set.count(), end_time - start_time)

Also, thank you for your help/guidance on my earlier input which was my first "official" adventure in contributing to a project!