headrun / SWIFT

2 stars 0 forks source link

Air France / KLM FlyingBlue Scraper #3

Closed malla794 closed 4 years ago

malla794 commented 4 years ago

Need to develop the new scraper to fetch the miles information from the website

MohanaVedingadu commented 4 years ago

@SreenivasDega

Please find below updates:

1.In this site when we trying to login there is google recapatch v3 need to handle that recaptcha 2.This very different to recapatch v1 and v2 , It is automatically detect the headless browsers based on the traffic. 3.Firstly I am searchig that recatch key in all apis but I didn't get then I am trying with selenium and puppeteer these two headless browser also not allowed

MohanaVedingadu commented 4 years ago

https://anti-captcha.com/apidoc/recaptcha Screenshot from 2020-04-15 06-27-26

By fix this recaptch v3 we need to pass all the keys, I am working on this keys

MohanaVedingadu commented 4 years ago

As madhav suggest I am go through with DBC document and need to collect the all the request body for sending DBC api.

MohanaVedingadu commented 4 years ago

@SreenivasDega

Please find below updates,

I will check the all the keys which are present in the site and send it into DBC request api, but each time I will found empty as response. I will check with different api without login but that api we need to pass the cookies those keys are found after login so it is also not working.

I want suggestion about recaptch v3, @malla794 please suggest me.

MohanaVedingadu commented 4 years ago

@SreenivasDega , @malla794 Please find below updates, By using puppeteer extra we are able to bypass the recaptch v3, I completed login and find flight by using puppeteer extra and then passing the source page into python, Pending is need to process the data regarding format.

Note: As discussing earier some times recaptch image is found we are thinking like it is depends on the browser, To day I am sending many request (more then 100) , As of now getting recaptch image at evary request.

Please find below screenshot for your reference Screenshot from 2020-04-26 01-27-55 Screenshot from 2020-04-26 01-44-24

MohanaVedingadu commented 4 years ago

@SreenivasDega,

This spider is completed. Note: proxy rotation not working for this site so I added user rotation.

nigitha1995 commented 4 years ago

Hi mohana,please find the issues. 1) "airport": "Los Angeles, International Airport" no need to take complete airport name take as LAX.(as per doc) for direct filghts. 2) since there is no manufacturer for some flights ,but taking model as manufacturer.