leoncvlt / blinkist-scraper

📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output
191 stars 35 forks source link

Captha taking longer than expected #42

Open albertlaudia opened 3 years ago

albertlaudia commented 3 years ago

I am experiencing "This is taking longer than expected; please reload the page." Loading

Not sure what's wrong. it happened on on the third captcha

rocketinventor commented 3 years ago

Hi @albertlaudia

At what part of the script is this happening? At the sign-in, or somewhere else? What happens if you manually change the URL after seeing this page..?

The way that the script is set up right now, it is meant to block the captcha from loading and just skip to the next page... If you want to load the captcha, you can open up uBlock, and switch over hcaptcha.com from block to allow.

However, I don't think that will help you much, because if you are getting the captcha page already, then you'll probably just get it again (even if you solve it).

albertlaudia commented 3 years ago

I am actually not 100% sure on how the captcha works. It just seems that the script stuck on the captha then throw an error that no internet Loading image

FirstClassCitizenFCC commented 3 years ago

Assuming you get the captcha after the login check, try to change the URL manually to https://www.blinkist.com/{language} when the captcha occurs.

obsessivelearner commented 3 years ago

Not OP but I have a similar issue and it started a day before this issue was opened. The script ran perfectly for about 2 weeks prior to this. I don't have a premium account, I just scrape the daily book at midnight every day.

That out of the way, I tried what @FirstClassCitizenFCC suggested and changing the URL manually first redirects me to https://www.blinkist.com/en/nc/library followed by the same captcha page immediately after. Interestingly, even though the terminal says "logged into blinkist" initially, the final error was "Failed to log in to Blinkist" so I am not sure if the captcha is before or after the login check though I'm assuming after because it does load my account's library for a split second before it gets stuck on the captcha.

The first image is my terminal output on a regular run, the second image is the output I get when I manually change URL after I'm stuck on the captcha.

image

image

johndoe-dev00 commented 3 years ago

I had the same problem with the captcha not loading correctly. Disabling ublock did the trick for me. Once you have sucessfully logged in (cookie file has been created) you can activate it again.

I also did a few other workarounds for the login process. You can check my fork.

obsessivelearner commented 3 years ago

I had no clue how to disable ublock in the script because I'm very new to coding but disabling ublock in my Chrome instance after scraping started let me solve a captcha and then it scraped the books as normal once I accepted cookies.

@johndoe-dev00 I see you have a docker build for this project! That is something I had been searching for like a madman. I'll definitely check out your fork and docker. I hope to run this project on my Synology NAS via docker :)

I realize the issue isn't solved but having found the inelegant solution that we have, I realize the issue may be closed and I just wanted to thank everybody who's worked on the project and I hope to pay it forward in the near future.

Riviss commented 3 years ago

What ended up being successful for me was disabling Ublock, then clicking on the captcha area quickly when the page first loads, then the captcha would actually pop up to be completed and everything would work. (This may work without first disabling ublock, I had already disabled it when I tried this)

If I just left the page to load without clicking quickly, it would go to the page with the screenshot @albertlaudia posted.

obsessivelearner commented 3 years ago

Disabling uBlock manually doesn't work anymore. Redirects to the following work of art:

image

The Title of the daily book is "The Internet of Us: Knowing More and Understanding Less in the Age of Big Data" and I'm not even mad.

Terminal Output looks like this:

image

rocketinventor commented 3 years ago

@obsessivelearner The issue that you are having has nothing to do with the script. The site is just broken right now...

Try navigating to https://www.blinkist.com/en/nc/daily/reader/the-internet-of-us-en manually in your web browser, you should see the same issue.

leoncvlt commented 3 years ago

Also I think that link appears broken simply because "The internet of us" is not available as the free daily book anymore - it probably worked for that day it was. https://www.blinkist.com/en/nc/daily should dynamically resolve to the free daily book, but reading the book from that link it doesn't send you to the book's generic reader page, but to a special https://www.blinkist.com/en/nc/daily/reader/{book-slug} url which obviously works for one day only.

rocketinventor commented 3 years ago

@leoncvlt Well, that was the book five days ago, but the Blinkist site was actually broken

jonaschn commented 3 years ago

Using --no-ublock worked for me. Also manually using the privacy-pass extension makes scraping audio possible again.