transitland / transitland-processing-animation

Animating scheduled transit trips using the Transitland API and Processing
MIT License
279 stars 55 forks source link

Problems with the stops data #11

Open jaimeorrego opened 6 years ago

jaimeorrego commented 6 years ago

Hello Will,

Thank you for the API it is very nice. I was testing it with different cities in the World, it worked very well in Portland, Or, but I found two issues in toher cities. I tested in Lisbon, Portugal using:

python transitflow.py --name=lisbon --bbox=-9.276933,38.592729,-8.940477,38.803201 --clip_to_bbox When downloading the transit operators, the API found routes and stops but finds 0 schedule stop pairs. An example of one of the largest operators.

o-eyck-carris 7 / 8
http://transit.land/api/v1/routes?per_page=10000&operated_by=o-eyck-carris
217 routes found.

http://transit.land/api/v1/stops?per_page=10000&served_by=o-eyck-carris
2093 stops found.

http://transit.land/api/v1/schedule_stop_pairs?date=2018-04-26&per_page=10000&sort_min_id=0&operator_onestop_id=o-eyck-carris
0 schedule stop pairs found.

Another test I did was in Santiago, Chile. Here the API has problems downloading the stops data.

python transitflow.py --name=Santiago --bbox=-70.673777,-33.460993,-70.595499,-33.394518 --clip_to_bbox

And it seems it cannot connect:

o-66jc-transantiago 2 / 2
http://transit.land/api/v1/routes?per_page=10000&operated_by=o-66jc-transantiago
383 routes found.

http://transit.land/api/v1/stops?per_page=10000&served_by=o-66jc-transantiago
retry 1 / 5: HTTP Error 504: Gateway Time-out
retry 2 / 5: HTTP Error 504: Gateway Time-out
retry 3 / 5: HTTP Error 504: Gateway Time-out
retry 4 / 5: HTTP Error 504: Gateway Time-out
retry 5 / 5: HTTP Error 504: Gateway Time-out
failed:
HTTP Error 504: Gateway Time-out
1 operators successfully downloaded.
1 operators failed.

I thinking in Lisbon case, it may be a problem with the structure of the GTFS data, and in Santiago maybe the file is too large?

Do you have any clues?

Thanks!

willgeary commented 6 years ago

Thanks for noting these issues, @jaimeorrego.

I can confirm the same errors for Lisbon and Santiago. I believe this is happening because large bus systems have a lot of stop_times to download, and the API is stalling with so many big requests.

I tried decreasing the API request size from 10,000 items per page to 1,000 items per page, and this seemed to help things! There are 10x more API requests, but each is 10x smaller. I also increased the API retry limit from 5 to 20, just in case.

Santiago looks better:

screen shot 2018-04-27 at 8 21 55 pm

Strangely, for Lisbon, it fails for me on today's date, but if I try this past Wednesday's date, the stop times for o-eyck-carris do successfully download:

transitflow will$ python transitflow.py --name=lisbon --bbox=-9.276933,38.592729,-8.940477,38.803201 --clip_to_bbox --date=2018-04-23

screen shot 2018-04-27 at 8 36 00 pm

I think I will add a new command line argument --per_page to allow for the user to determine the number of items per page of each API request, as well as --retrylimit.

Does this sound good to you?

Best, Will

AnthonyLovesBikes commented 6 years ago

Thanks this is very helpful, I have been having both issues above working on Toronto, Canada area. The TTC operator seems to be too large and fails for all dates I have tried, even with the API query set to 1000 - could you test this on your end? the error I get is "[Errno 34] Result too large" Thanks, I love this tool!

python transitflow.py --name=TTC --operator=o-dpz8-ttc

python transitflow.py --name=Toronto --bbox=-79.472351,43.597798,-79.280777,43.709083 --clip_to_bbox

willgeary commented 6 years ago

Thanks @AnthonyLovesBikes, I can confirm the same error for Toronto area. Yes, the TTC operator seems to be too large. Although, I have seen at least one example of somebody using this tool to visualize Toronto transit flows (they even wrote a program to convert transit frequency into audio!): See: https://rami-codes.github.io/2017/11/07/transitland-audiolizer/

Frankly, I am not sure if downloading massive schedules via the paginated transitland API is the best approach. It is much faster to download the raw GTFS zip file and process it locally with a python script. I would love to add a "drag and drop" capability to this tool, such that a user could decide to use the transitland API or to use a local GTFS zip file. Any thoughts on this functionality are welcome!

Best, Will

AnthonyLovesBikes commented 6 years ago

Thank you! Yes I agree a manual GTFS adder would be ideal. If can include multiple agencies that would be best. I have messaged the other user to inquire how they made the TTC viz work... Will let you know if I learn more. I am now having an issue with GO transit, though that one worked before for me. Can you let me know if that one works for you?

Sent from my iPhone

On Apr 29, 2018, at 11:15 AM, Will Geary notifications@github.com wrote:

Thanks @AnthonyLovesBikes, I can confirm the same error for Toronto area. Yes, the TTC operator seems to be too large. Although, I have seen at least one example of somebody using this tool to visualize Toronto transit flows (they even wrote a program to convert transit frequency into audio!): See: https://rami-codes.github.io/2017/11/07/transitland-audiolizer/

Frankly, I am not sure if downloading massive schedules via the paginated transitland API is the best approach. It is much faster to download the raw GTFS zip file and process it locally with a python script. I would love to add a "drag and drop" capability to this tool, such that a user could decide to use the transitland API or to use a local GTFS zip file. Any thoughts on this functionality are welcome!

Best, Will

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jaimeorrego commented 6 years ago

Thank you @willgeary! by changing the request site it works fine. I am just entering the world of GTFS data and definitely the drag and drop option would be interesting. Maybe is not exactly the place in this API, but also would be nice to have a GTFS data processor, that let you after some variable setting obtain a output.csv (for example, the number of the route). The idea of course would use the data in other kind of application. Thanks!

willgeary commented 6 years ago

Great, glad to hear that things are working for you @jaimeorrego.

I agree that a GTFS data processor would be nice. Frankly, I am considering whether that should belong within this project or as a standalone project.

temospena commented 6 years ago

Hi, I have the same problem as @jaimeorrego with data for Lisbon. But strangely, I can only download successfully the data for weekends or national holidays, maybe when the frequency of the buses (carris) is lower. I tried 1st May, 25th April, 1st April, and it was successful. I tried 23rd April, a regular day (as it seems @willgeary did, but the print screen then shows 25th April), and it doesn't fetch the data, neither other regular days in April. I changed the request size and limit as you suggested.

I agree that a option to run data locally would be better.

Thanks for the api!