Closed nalvarez508 closed 2 years ago
Tested with curl -i -X POST -H 'Content-Type: application/json' -d <somedata> https://www.amtrak.com/v4/journey-solution-option
may have to try with cookies or through a browser.
One thought is to get cookies from a chromedriver session and pass this in the post request
This returned the default "couldn't process your request" error from amtrak's site, same as when using an unmodified webdriver
Parsing session storage worked. Each train is stored in the following path:
journeySolutionOption : dict
journeyLegs : list (of one dict)
journeyLegOptions : list (of all results as dict)
% data stored in here as a mix of list of lists and dicts
This not only makes parsing a page faster as we just need session storage, but it also reveals a lot of hidden data such as fare types and train information for multiple segments, seating accommodations, etc. Would need a deep dive.
Useful keys from session storage.
searchresults
is the meat of what we're after
stationsData_stations
returns a list of every location served by amtrak, train or bus
traincodes
returns every train by number. Train website is /firstword-secondword-train
.
The latter two can be retrieved just by going to amtrak.com
After closer examination:
traincodes
is useful, but for what, I am not sure in this applicationstationsData_stations
is great and well formatted but includes bus stops which seems out of scope at this point. However I would not be opposed to including this feature later, perhaps as an "Include Bus Stops" option.searchresults
I really want to work but it still requires that we use a search, and with a webdriver, what's the pointJSON data only
Included in a (future) commit are a headers file and resulting search results json data (stored in session storage). At the very least, the data could be parsed quicker than checking each browser element could.