Open ghost opened 3 years ago
I got the same server error, it seems that Amazon detects that you are trying to scrape their website. I oppened it with my browser, checked the cookies in the request, and then uses those cookies in the request that is been done in app.py, something like this: cookies= {"aws_lang": "YOUR_AWS_LANG", "i18n-prefs" : "YOUR_i18N_PREFS", "regStatus" : "YOUR_REGSTATUS", "session-id" : "YOUR_SESSION_ID", "session-id-time" : "YOUR_SESSION_ID_TIME", "session-token": "YOUR_SESSION_TOKEN", "skin" : "YOUR_SKIN", "sp-cdn": "YOUR_SP_CDN", "ubid-main": "YOUR_UBID"} ... r = requests.get(url, headers=headers, cookies=cookies) ... (and I also had to change this line:) data['number_of_reviews'] = int(data['number_of_reviews'].split(' global rating')[0].replace(',', '') )
@cberrioa thank you so much for your helpful comments! By following your directions, I was finally able to get something other than "internal server error" out. I made a big mistake assuming that this API would work out of the box...would you by any chance have any idea how to automate the extraction of the cookies in flask? Also, did the API just return the first page of review results for you? That's all that I have been able to get out so far, though I suppose the cookies are the real show-stopper here.
@cberrioa thank you so much for your helpful comments! By following your directions, I was finally able to get something other than "internal server error" out. I made a big mistake assuming that this API would work out of the box...would you by any chance have any idea how to automate the extraction of the cookies in flask? Also, did the API just return the first page of review results for you? That's all that I have been able to get out so far, though I suppose the cookies are the real show-stopper here.
@schabertrobbinger sorry, I haven't tried to extract these cookies automatically, and right now I don't have much idea how to do that. I have the same issue that the API just returns the first page of review results to me, it seems that the other reviews are obtained dynamically.
It says {"error":"URL to scrape is not provided"} every time I run the flask application, and I wonder why this happens.
I get an internal server error on running this