Closed iloveitaly closed 5 years ago
Awesome! I’ll look through this and merge it in by tonight.
This looks great. I'm have tried a previously working proxy setup (with both hostname and port) and one through illuminati.io and am getting the following errors along with the price not updating:
Jun 11 13:41:15 swacheck2 app/scheduler.9385: > southwest-price-drop-bot@3.1.4 task:check /app
Jun 11 13:41:15 swacheck2 app/scheduler.9385: > node --trace-warnings tasks/check.js
Jun 11 13:41:16 swacheck2 app/scheduler.9385: (node:23) UnhandledPromiseRejectionWarning: Error: Invalid "proxyUrl" option: the URL must contain both hostname and port.
Jun 11 13:41:16 swacheck2 app/scheduler.9385: at Object.anonymizeProxy (/app/node_modules/proxy-chain/build/anonymize_proxy.js:32:15)
Jun 11 13:41:16 swacheck2 app/scheduler.9385: at module.exports (/app/lib/browser.js:10:39)
Jun 11 13:41:16 swacheck2 app/scheduler.9385: at /app/tasks/check.js:12:23
Jun 11 13:41:16 swacheck2 app/scheduler.9385: at Object.
I was able to get the proxy working by including http:// in front of the url. That being said, now it's having issues scraping. See logs:
Jun 11 16:12:28 swacheck2 app/scheduler.5302: mongo successfully connected! Jun 11 16:12:29 swacheck2 app/scheduler.5302: found 1 alerts, checking... Jun 11 16:12:29 swacheck2 app/scheduler.5302: lock has available permits: 5 Jun 11 16:12:29 swacheck2 app/scheduler.5302: Entered lock, available permits: 4 Jun 11 16:12:30 swacheck2 app/scheduler.5302: Retrieving URL: https://www.southwest.com/air/booking/select.html?originationAirportCode=LAX&destinationAirportCode=PVR&returnAirportCode=&departureDate=2019-08-22&departureTimeOfDay=ALL_DAY&returnDate=&returnTimeOfDay=ALL_DAY&adultPassengersCount=1&seniorPassengersCount=0&fareType=USD&passengerType=ADULT&tripType=oneway&promoCode=&reset=true&redirectToVision=true&int=HOMEQBOMAIR&leapfrogRequest=true Jun 11 16:14:32 swacheck2 app/scheduler.5302: Unable to get flights - trying again Jun 11 16:14:32 swacheck2 app/scheduler.5302:
Jun 11 16:14:32 swacheck2 app/scheduler.5302: { Jun 11 16:14:32 swacheck2 app/scheduler.5302: status: '200', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'content-type': 'text/html; charset=UTF-8', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'x-ion-hop': '1', Jun 11 16:14:32 swacheck2 app/scheduler.5302: expires: '0', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'cache-control': 'no-cache, no-store, must-revalidate', Jun 11 16:14:32 swacheck2 app/scheduler.5302: pragma: 'no-cache', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'content-encoding': 'gzip', Jun 11 16:14:32 swacheck2 app/scheduler.5302: vary: 'Accept-Encoding', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'x-akamai-transformed': '9 64888 0 pmb=mNONE,1', Jun 11 16:14:32 swacheck2 app/scheduler.5302: date: 'Tue, 11 Jun 2019 23:12:30 GMT', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'content-length': '58822', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'set-cookie': 'akavpau_prod_fullsite=1560294780~id=ec6648d3009d2cd2e75488f337c82749; ' + Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'Path=/', Jun 11 16:14:32 swacheck2 app/scheduler.5302: 'strict-transport-security': 'max-age=600' Jun 11 16:14:32 swacheck2 app/scheduler.5302: } Jun 11 16:14:32 swacheck2 app/scheduler.5302: 200 Jun 11 16:16:32 swacheck2 app/scheduler.5302: Error: ERROR! Unknown error! Unable to find flight information on page: https://www.southwest.com/air/booking/select.html?originationAirportCode=LAX&destinationAirportCode=PVR&returnAirportCode=&departureDate=2019-08-22&departureTimeOfDay=ALL_DAY&returnDate=&returnTimeOfDay=ALL_DAY&adultPassengersCount=1&seniorPassengersCount=0&fareType=USD&passengerType=ADULT&tripType=oneway&promoCode=&reset=true&redirectToVision=true&int=HOMEQBOMAIR&leapfrogRequest=true Jun 11 16:16:32 swacheck2 app/scheduler.5302: html: Jun 11 16:16:32 swacheck2 app/scheduler.5302: at getPage (/app/lib/bot/get-price.js:212:17) Jun 11 16:16:32 swacheck2 app/scheduler.5302: at processTicksAndRejections (internal/process/task_queues.js:89:5) Jun 11 16:16:32 swacheck2 app/scheduler.5302: at async getFlights (/app/lib/bot/get-price.js:47:14) Jun 11 16:16:32 swacheck2 app/scheduler.5302: at async getPriceForFlight (/app/lib/bot/get-price.js:8:20) Jun 11 16:16:32 swacheck2 app/scheduler.5302: at async Alert.getLatestPrice (/app/lib/bot/alert.js:172:19) Jun 11 16:16:32 swacheck2 app/scheduler.5302: at async /app/tasks/check.js:33:9 Jun 11 16:16:32 swacheck2 app/scheduler.5302: at async Promise.all (index 0) Jun 11 16:16:32 swacheck2 app/scheduler.5302: at async /app/tasks/check.js:69:5 Jun 11 16:16:32 swacheck2 app/scheduler.5302: No flights found! Jun 11 16:16:32 swacheck2 app/scheduler.5302: Min price: Infinity Jun 11 16:16:32 swacheck2 app/scheduler.5302: Got price: 8/22/2019|LAX|PVR|110 { time: 1560294749217, price: Infinity } Jun 11 16:16:32 swacheck2 app/scheduler.5302: 8/22/2019 #110 LAX → PVR not cheaper Jun 11 16:16:32 swacheck2 heroku/scheduler.5302: State changed from up to complete Jun 11 16:16:33 swacheck2 heroku/scheduler.5302: Process exited with status 0@razzamatazm
...previously working proxy setup
How recently was this working? If you revert to your previous setup are you able to scrape successfully?
Since I posted this PR it looks like SW is blocking requests (from a proxy or my local connection). It looks like they've updated their bot detection system, and it's gotten much much better.
@iloveitaly It had been working prior to when their bot detection was first implemented. That being said, I was able to move past the error I was receiving in my first post by including "http://" in the proxy var. That being said, now the app is having trouble scraping the price. I was initially searching an international flight booked with points, so to test I tried a US flight booked with cash and it's still having issues.
I'm seeing the same thing - looks like an Akamai block.
@samyun and @iloveitaly - I setup a proxy server at my homelab and still run into the issues - no problems accessing the southwest site through a browser. Not sure if it's Akamai in this case.
https://github.com/pyro2927/SouthwestCheckin/ <-- This is working as of now. I wonder if we can pull some of the techniques used. It uses the mobile api.
@razzamatazm ah, interesting! I didn't realize there was a mobile API. Looks like the flight cost endpoint hasn't been figured out yet. Any ideas on how to hit it?
@samyun I'm pretty sure it's not a Akamai block. Here's why:
curl https://www.southwest.com/air/booking/select.html
There's some analytics code, some obfucated code, and then a snippet that hits a unique token on the root SW domain when the page has loaded and then reloads the page. swa-common
is loaded on this page as well, but I'm not sure if it's a duplicate of the inline JS or not (my hunch is it is).Object.create(null)
, find where it's actively used and add a debugger
call next to it. You'll need to do some fiddling to find the right place. You can pull the code into a standalone HTML file to fiddle with it locally.southwest.com/TOKEN
URL specified in the initial page load. Fancy stuff!I went ahead and did this one last time and realized the flags I had to disable the WebGL/GPU stuff was causing the issue. This is now working again!
Hmm, now it's not working for me. No idea why. Can you guys try HEAD
and see if it works for you?
I'm getting build errors on Heroku
info fsevents@1.2.9: The platform "linux" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed
compatibility check. Excluding it from installation.
error fsevents@2.0.7: The platform "linux" is incompatible with
this module.
error Found incompatible module.
info Visit https://yarnpkg.com/en/docs/cli/install for
documentation about this command.
-----> Build failed
We're sorry this build is failing! You can troubleshoot
common issues here:
https://devcenter.heroku.com/articles/troubleshooting-node-deploys
Some possible problems:
- Dangerous semver range (>) in engines.node
https://devcenter.heroku.com/articles/nodejs-support#specifying-a-node-js-version
Love,
Heroku
! Push rejected, failed to compile Node.js app.
! Push failed
On Thu, Jun 13, 2019 at 8:53 AM Michael Bianco notifications@github.com wrote:
Hmm, now it's not working for me. No idea why. Can you guys try HEAD and see if it works for you?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samyun/southwest-price-drop-bot/pull/49?email_source=notifications&email_token=AFOEFJUQTKMZNRVUJ5FI6RLP2JUQDA5CNFSM4HTMMMSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUEYZA#issuecomment-501763172, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOEFJRRCAGBTUPH6IMHL2TP2JUQDANCNFSM4HTMMMSA .
New, different, exciting errors :)
Jun 13 11:52:55 swacheck3 heroku/router: at=info method=GET path="/style.css" host=swacheck3.herokuapp.com request_id=1fad41d7-311b-41c0-9876-8bd743dc5526 fwd="67.53.122.46" dyno=web.1 connect=0ms service=6ms status=304 bytes=269 protocol=https Jun 13 11:52:55 swacheck3 heroku/router: at=info method=GET path="/logo.png" host=swacheck3.herokuapp.com request_id=3af111c9-21b6-4717-b8be-3e059f01326a fwd="67.53.122.46" dyno=web.1 connect=0ms service=10ms status=304 bytes=271 protocol=https Jun 13 11:52:55 swacheck3 app/web.1: Retrieving URL: https://www.southwest.com/air/booking/select.html?originationAirportCode=LAX&destinationAirportCode=PHX&returnAirportCode=&departureDate=2019-08-22&departureTimeOfDay=ALL_DAY&returnDate=&returnTimeOfDay=ALL_DAY&adultPassengersCount=1&seniorPassengersCount=0&fareType=USD&passengerType=ADULT&tripType=oneway&promoCode=&reset=true&redirectToVision=true&int=HOMEQBOMAIR&leapfrogRequest=true Jun 13 11:52:58 swacheck3 app/web.1: PAGE LOG: Failed to load resource: net::ERR_FAILED Jun 13 11:52:58 swacheck3 app/web.1: PAGE LOG: Failed to load resource: the server responded with a status of 403 () Jun 13 11:54:57 swacheck3 app/web.1: Unable to get flights - trying again Jun 13 11:54:57 swacheck3 app/web.1:
Jun 13 11:54:57 swacheck3 app/web.1: { Jun 13 11:54:57 swacheck3 app/web.1: status: '200', Jun 13 11:54:57 swacheck3 app/web.1: 'content-type': 'text/html; charset=UTF-8', Jun 13 11:54:57 swacheck3 app/web.1: 'x-ion-hop': '1', Jun 13 11:54:57 swacheck3 app/web.1: expires: '0', Jun 13 11:54:57 swacheck3 app/web.1: 'cache-control': 'no-cache, no-store, must-revalidate', Jun 13 11:54:57 swacheck3 app/web.1: pragma: 'no-cache', Jun 13 11:54:57 swacheck3 app/web.1: 'content-encoding': 'gzip', Jun 13 11:54:57 swacheck3 app/web.1: vary: 'Accept-Encoding', Jun 13 11:54:57 swacheck3 app/web.1: 'x-akamai-transformed': '9 - 0 pmb=mNONE,1', Jun 13 11:54:57 swacheck3 app/web.1: date: 'Thu, 13 Jun 2019 18:52:56 GMT', Jun 13 11:54:57 swacheck3 app/web.1: 'content-length': '58937', Jun 13 11:54:57 swacheck3 app/web.1: 'set-cookie': 'akavpau_prod_fullsite=1560452006~id=bf704764f98270f44819cda28444db01; ' + Jun 13 11:54:57 swacheck3 app/web.1: 'Path=/', Jun 13 11:54:57 swacheck3 app/web.1: 'strict-transport-security': 'max-age=600' Jun 13 11:54:57 swacheck3 app/web.1: } Jun 13 11:54:57 swacheck3 app/web.1: 200 Jun 13 11:54:58 swacheck3 app/web.1: PAGE LOG: Failed to load resource: the server responded with a status of 403 () Jun 13 11:56:58 swacheck3 app/web.1: Error: ERROR! Unknown error! Unable to find flight information on page: https://www.southwest.com/air/booking/select.html?originationAirportCode=LAX&destinationAirportCode=PHX&returnAirportCode=&departureDate=2019-08-22&departureTimeOfDay=ALL_DAY&returnDate=&returnTimeOfDay=ALL_DAY&adultPassengersCount=1&seniorPassengersCount=0&fareType=USD&passengerType=ADULT&tripType=oneway&promoCode=&reset=true&redirectToVision=true&int=HOMEQBOMAIR&leapfrogRequest=true Jun 13 11:56:58 swacheck3 app/web.1: html: Jun 13 11:56:58 swacheck3 app/web.1: at getPage (/app/lib/bot/get-price.js:240:17) Jun 13 11:56:58 swacheck3 app/web.1: at processTicksAndRejections (internal/process/task_queues.js:89:5) Jun 13 11:56:58 swacheck3 app/web.1: at async getFlights (/app/lib/bot/get-price.js:51:14) Jun 13 11:56:58 swacheck3 app/web.1: at async getPriceForFlight (/app/lib/bot/get-price.js:8:20) Jun 13 11:56:58 swacheck3 app/web.1: at async Alert.getLatestPrice (/app/lib/bot/alert.js:172:19) Jun 13 11:56:58 swacheck3 app/web.1: at async /app/lib/apps/app.js:72:3 Jun 13 11:56:58 swacheck3 app/web.1: No flights found! Jun 13 11:56:58 swacheck3 app/web.1: Min price: Infinity Jun 13 11:56:58 swacheck3 app/web.1: Got price: 8/22/2019|LAX|PHX|1121 { time: 1560451974957, price: Infinity }@razzamatazm yup, the 403 is SW blocking us. No idea how to get around this. I think it has something to do with the IP used, but I can't be sure.
I can reach the site using chrome at my home, via the same proxy. So strange.
On Thu, Jun 13, 2019 at 1:24 PM Michael Bianco notifications@github.com wrote:
@razzamatazm https://github.com/razzamatazm yup, the 403 is SW blocking us. No idea how to get around this. I think it has something to do with the IP used, but I can't be sure.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samyun/southwest-price-drop-bot/pull/49?email_source=notifications&email_token=AFOEFJU3XFJDRPH6CS7RQKDP2KUIPA5CNFSM4HTMMMSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXU5VKQ#issuecomment-501865130, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOEFJU2AMF4HX7QIZ6W6NDP2KUIPANCNFSM4HTMMMSA .
@razzamatazm is that using this repo, or by manually accessing it via standard chrome?
I think what's going on is SW is associating a browser fingerprint with an IP and then blocking that IP. I know somewhere in the SW code they are checking the __webdriver_script_fn
var which is not hidden using the evasions currently implemented.
I think the best option is to use the mobile API, but it doesn't look like the price check endpoint has been figured out yet (and I don't have the time to tinker with it).
In any case, this is a huge improvement over what was there, although it doesn't actually work :(
I went ahead and merged this in - I found some other evasion repos I'm going to try to work in. Thanks for your help!
@samyun awesome! It's worth noting that this is now working locally again. I think there is some sort of IP block triggered by repeated requests for the same flight (or something alone those lines... just guessing really). Keep us posted on what you find!
Lots of improvements!