opentripplanner / OpenTripPlanner

An open source multi-modal trip planner
http://www.opentripplanner.org
Other
2.19k stars 1.03k forks source link

Default agency ID used even if alternate is provided in GTFS #564

Closed chosak closed 12 years ago

chosak commented 12 years ago

When I run API queries against a graph built from the WMATA GTFS file, which contains correct agency definitions, the returned data contains reference to the incorrect default agency ID ("TriMet").

I've modified my graph-builder.xml file to include:

<property name="url" value="http://www.gtfs-data-exchange.com/agency/wmata/latest.zip" />
<property name="defaultAgencyId" value="TriMet" />

Because the WMATA GTFS file actually contains data for two different agencies ("METRO" and "DC Circulator"), there isn't one appropriate default agency ID, so I've just left it as the default. I had assumed that because agency IDs are properly used in the GTFS dataset that this "TriMet" default agency ID would never be used.

Unfortunately, running a simple query shows that this isn't the case:

http://path.to.server:8080/opentripplanner-api-webapp/ws/plan?  \
_dc=1321458174889&arriveBy=false&date=11%2F16%2F2011&time=10%3A42%20am&mode=TRANSIT%2CWALK \
&optimize=QUICK&maxWalkDistance=840&routerId=&toPlace=38.904201946181%2C-77.021589041702 \ 
&fromPlace=38.891376495065%2C-77.094373465526&intermediatePlaces=

The above query, run against the graph built from the WMATA GTFS, returns, in part:

<leg routeLongName="Metrorail Orange Line" routeShortName="Orange" agencyId="2" headsign="NEW CARROLLTON" route="Orange" mode="SUBWAY">
   <startTime>2011-11-16T11:16:18-05:00</startTime>
   <endTime>2011-11-16T11:33:00-05:00</endTime>
   <distance>8081.244773005155</distance>
   <from>
      <name>CLARENDON METRORAIL STATION</name>
      <stopId>
         <agencyId>TriMet</agencyId>
         <id>12648</id>
      </stopId>
      <lon>-77.094987</lon>
      <lat>38.887176</lat>
      <geometry>
         {"type": "Point", "coordinates": [-77.094987,38.887176]}
      </geometry>
   </from>
    ...

Here the leg has the correct agency ID of 2 ("METRO"), but the stop has a totally wrong agency ID! It should read "METRO" instead of "TriMet".

I've looked into the code and I have an idea for what could be causing this:

There appears to be a issue with the way that some of the models in the onebusaway-gtfs library refer to an agency ID where there isn't one specified in the GTFS specification. Model fields that should be defined to refer to a single ID are defined using the "AgencyAndID" data type when it seems like there isn't enough information to fill in that type.

Specifically, org.onebusaway.gtfs.model.Trip includes the following members:

public final class Trip extends IdentityBean<AgencyAndId> {
   ...
   private AgencyAndId id;
   private AgencyAndId serviceId;
   private AgencyAndId shapeId;
   ...
}

and org.onebusaway.gtfs.model.ServiceCalendar includes:

public final class ServiceCalendar extends IdentityBean<Integer> {
   ...
   private int id;
   private AgencyAndId serviceId;
   ...

and org.onebusaway.gtfs.model.Stop includes:

public final class Stop extends IdentityBean<AgencyAndId> {
   ...
   private AgencyAndId id;
   ...

According to the GTFS spec, none of these fields should refer to an agency, but should be a unique identifier of unspecified type. I'm not very familiar with the OneBusAway library, but it looks like the model loader is trying to be smart and associate an agency ID with each entity at load time, even though the files don't actually specify one. The only information it has to use at this point is the default agency ID, which is wrong.

The place where the agency ID actually gets set on the entities as they are loaded is the resolveAgencyId method in org.onebusaway.gtfs.serialization.mappings.DefaultAgencyIdFieldMappingFactory. This always fills in those agency IDs with the default ID.

One possible solution for this would be to go back through the entities after they are loaded and fill in the proper agency ID by looking through the relationships between agencies, trips, stops, etc. But what happens if multiple agencies service the same stop? It seems that a stop shouldn't necessarily be linked to one particular agency. Rather, the route leg should be the place where the agency is specified. Ideally, this could contain both the agency ID as well as its name.

novalis commented 12 years ago

The reason that defaultAgencyId works the way it does is to support multiple GTFS files from different agencies, some of which have the same stops with the same stopIds, and others of which have different stops with potentially overlapping stopIds. So, there could be agency A with stop ID 1 meaning Main & Elm, agency B with stop ID 1 meaning Main & Elm, and agency C with stop ID 1 meaning 4th and Pleasant. You would set the defaultAgencyId on B to A, and leave it out entirely on C, and you would be done.

I think leaving the defaultAgencyId out will work for you, too; things that are do not have an agency specified will have the agency of the first agency listed in the GTFS (I think), and things with an agency specified will have that agency.

On Wed, 2011-11-16 at 08:03 -0800, Andy Chosak wrote:

When I run API queries against a graph built from the WMATA GTFS file, which contains correct agency definitions, the returned data contains reference to the incorrect default agency ID ("TriMet").

I've modified my graph-builder.xml file to include:

<property name="url" value="http://www.gtfs-data-exchange.com/agency/wmata/latest.zip" />
<property name="defaultAgencyId" value="TriMet" />

Because the WMATA GTFS file actually contains data for two different agencies ("METRO" and "DC Circulator"), there isn't one appropriate default agency ID, so I've just left it as the default. I had assumed that because agency IDs are properly used in the GTFS dataset that this "TriMet" default agency ID would never be used.

Unfortunately, running a simple query shows that this isn't the case:

http://path.to.server:8080/opentripplanner-api-webapp/ws/plan?  \
_dc=1321458174889&arriveBy=false&date=11%2F16%2F2011&time=10%3A42%20am&mode=TRANSIT%2CWALK \
&optimize=QUICK&maxWalkDistance=840&routerId=&toPlace=38.904201946181%2C-77.021589041702 \ 
&fromPlace=38.891376495065%2C-77.094373465526&intermediatePlaces=

The above query, run against the graph built from the WMATA GTFS, returns, in part:

<leg routeLongName="Metrorail Orange Line" routeShortName="Orange" agencyId="2" headsign="NEW CARROLLTON" route="Orange" mode="SUBWAY">
   <startTime>2011-11-16T11:16:18-05:00</startTime>
   <endTime>2011-11-16T11:33:00-05:00</endTime>
   <distance>8081.244773005155</distance>
   <from>
      <name>CLARENDON METRORAIL STATION</name>
      <stopId>
         <agencyId>TriMet</agencyId>
         <id>12648</id>
      </stopId>
      <lon>-77.094987</lon>
      <lat>38.887176</lat>
      <geometry>
         {"type": "Point", "coordinates": [-77.094987,38.887176]}
      </geometry>
   </from>
    ...

Here the leg has the correct agency ID of 2 ("METRO"), but the stop has a totally wrong agency ID! It should read "METRO" instead of "TriMet".

I've looked into the code and I have an idea for what could be causing this:

There appears to be a issue with the way that some of the models in the onebusaway-gtfs library refer to an agency ID where there isn't one specified in the GTFS specification. Model fields that should be defined to refer to a single ID are defined using the "AgencyAndID" data type when it seems like there isn't enough information to fill in that type.

Specifically, org.onebusaway.gtfs.model.Trip includes the following members:

public final class Trip extends IdentityBean<AgencyAndId> {
   ...
   private AgencyAndId id;
   private AgencyAndId serviceId;
   private AgencyAndId shapeId;
   ...
}

and org.onebusaway.gtfs.model.ServiceCalendar includes:

public final class ServiceCalendar extends IdentityBean<Integer> {
   ...
   private int id;
   private AgencyAndId serviceId;
   ...

and org.onebusaway.gtfs.model.Stop includes:

public final class Stop extends IdentityBean<AgencyAndId> {
   ...
   private AgencyAndId id;
   ...

According to the GTFS spec, none of these fields should refer to an agency, but should be a unique identifier of unspecified type. I'm not very familiar with the OneBusAway library, but it looks like the model loader is trying to be smart and associate an agency ID with each entity at load time, even though the files don't actually specify one. The only information it has to use at this point is the default agency ID, which is wrong.

The place where the agency ID actually gets set on the entities as they are loaded is the resolveAgencyId method in org.onebusaway.gtfs.serialization.mappings.DefaultAgencyIdFieldMappingFactory. This always fills in those agency IDs with the default ID.

One possible solution for this would be to go back through the entities after they are loaded and fill in the proper agency ID by looking through the relationships between agencies, trips, stops, etc. But what happens if multiple agencies service the same stop? It seems that a stop shouldn't necessarily be linked to one particular agency. Rather, the route leg should be the place where the agency is specified. Ideally, this could contain both the agency ID as well as its name.


Reply to this email directly or view it on GitHub: https://github.com/openplans/OpenTripPlanner/issues/564

bdferris commented 12 years ago

I wrote the GTFS library in question. You are correct that the GTFS spec doesn't official define a linkage between agencies and stops. That said, we need some way to distinguish stops when you load multiple GTFS feeds into the same application, because there is no guarantee that two stops in different GTFS feeds won't reuse the same stop id. Thus, we associate an agency id with every stop to serve as a namespace of sorts. Which agency id should we use? In most feeds, there is only one agency and the decision is simple. However, as you've pointed out, things get more complicated when you have a multi-agency feed where it's not clear which agency owns a particular stop. To simplify things, we still just pick one default agency id for stops in a feed. You can control which agency id is used by setting the "defaultAgencyId" parameter. If you'd like a more nuanced agency id assignment mechanism, you will probably have to implement it yourself.

Thanks, Brian

On Wed, Nov 16, 2011 at 5:03 PM, Andy Chosak reply@reply.github.com wrote:

When I run API queries against a graph built from the WMATA GTFS file, which contains correct agency definitions, the returned data contains reference to the incorrect default agency ID ("TriMet").

I've modified my graph-builder.xml file to include:

<property name="url" value="http://www.gtfs-data-exchange.com/agency/wmata/latest.zip" />
<property name="defaultAgencyId" value="TriMet" />

Because the WMATA GTFS file actually contains data for two different agencies ("METRO" and "DC Circulator"), there isn't one appropriate default agency ID, so I've just left it as the default. I had assumed that because agency IDs are properly used in the GTFS dataset that this "TriMet" default agency ID would never be used.

Unfortunately, running a simple query shows that this isn't the case:

http://path.to.server:8080/opentripplanner-api-webapp/ws/plan?  \
_dc=1321458174889&arriveBy=false&date=11%2F16%2F2011&time=10%3A42%20am&mode=TRANSIT%2CWALK \
&optimize=QUICK&maxWalkDistance=840&routerId=&toPlace=38.904201946181%2C-77.021589041702 \
&fromPlace=38.891376495065%2C-77.094373465526&intermediatePlaces=

The above query, run against the graph built from the WMATA GTFS, returns, in part:

<leg routeLongName="Metrorail Orange Line" routeShortName="Orange" agencyId="2" headsign="NEW CARROLLTON" route="Orange" mode="SUBWAY">
  <startTime>2011-11-16T11:16:18-05:00</startTime>
  <endTime>2011-11-16T11:33:00-05:00</endTime>
  <distance>8081.244773005155</distance>
  <from>
     <name>CLARENDON METRORAIL STATION</name>
     <stopId>
        <agencyId>TriMet</agencyId>
        <id>12648</id>
     </stopId>
     <lon>-77.094987</lon>
     <lat>38.887176</lat>
     <geometry>
        {"type": "Point", "coordinates": [-77.094987,38.887176]}
     </geometry>
  </from>
   ...

Here the leg has the correct agency ID of 2 ("METRO"), but the stop has a totally wrong agency ID! It should read "METRO" instead of "TriMet".

I've looked into the code and I have an idea for what could be causing this:

There appears to be a issue with the way that some of the models in the onebusaway-gtfs library refer to an agency ID where there isn't one specified in the GTFS specification. Model fields that should be defined to refer to a single ID are defined using the "AgencyAndID" data type when it seems like there isn't enough information to fill in that type.

Specifically, org.onebusaway.gtfs.model.Trip includes the following members:

public final class Trip extends IdentityBean<AgencyAndId> {
  ...
  private AgencyAndId id;
  private AgencyAndId serviceId;
  private AgencyAndId shapeId;
  ...
}

and org.onebusaway.gtfs.model.ServiceCalendar includes:

public final class ServiceCalendar extends IdentityBean<Integer> {
  ...
  private int id;
  private AgencyAndId serviceId;
  ...

and org.onebusaway.gtfs.model.Stop includes:

public final class Stop extends IdentityBean<AgencyAndId> {
  ...
  private AgencyAndId id;
  ...

According to the GTFS spec, none of these fields should refer to an agency, but should be a unique identifier of unspecified type. I'm not very familiar with the OneBusAway library, but it looks like the model loader is trying to be smart and associate an agency ID with each entity at load time, even though the files don't actually specify one. The only information it has to use at this point is the default agency ID, which is wrong.

The place where the agency ID actually gets set on the entities as they are loaded is the resolveAgencyId method in org.onebusaway.gtfs.serialization.mappings.DefaultAgencyIdFieldMappingFactory. This always fills in those agency IDs with the default ID.

One possible solution for this would be to go back through the entities after they are loaded and fill in the proper agency ID by looking through the relationships between agencies, trips, stops, etc. But what happens if multiple agencies service the same stop? It seems that a stop shouldn't necessarily be linked to one particular agency. Rather, the route leg should be the place where the agency is specified. Ideally, this could contain both the agency ID as well as its name.


Reply to this email directly or view it on GitHub: https://github.com/openplans/OpenTripPlanner/issues/564

chosak commented 12 years ago

Thanks for the quick responses.

@novalis:

I think leaving the defaultAgencyId out will work for you, too; things that are do not have an agency specified will have the agency of the first agency listed in the GTFS (I think), and things with an agency specified will have that agency.

If you remove the <defaultAgencyId> line from the graph-builder.xml file entirely, then it seems to assign all of the stops an <agencyId> of "1", even if that's not the proper ID for that stop. If you set it to be blank, then this has the effect of making all stops have a blank ID. At least this removes the problem of putting a random default ID into your output!

@bdferris:

Thanks for the explanation. I agree it's not an obvious problem since there's no easy way to tell which agencies go with which stop. I can imagine a post-processing step for the graph builder that looks at each stop, finds the trips for that stop, finds the routes for that trip, and finds the agencies for those routes. It could then tag the stop with the agency or agencies that services it (but if there were multiple agencies you'd need a way to represent that).

novalis commented 12 years ago

"1" is the first agency id in that GTFS file.

I actually think it's a good idea to propose that agency_id be added to stops.txt (and stop_times.txt) on the GTFS-changes mailing list. This would allow much better cross-agency coordination.