niqdev / packtpub-crawler

Download your daily free Packt Publishing eBook https://www.packtpub.com/packt/offers/free-learning
MIT License
755 stars 178 forks source link

Captcha and DOM issues with new account page #59

Open juzim opened 7 years ago

juzim commented 7 years ago

Packtpub apparently worked on their account page and it broke a few things (sometimes it works so it might be a A/B test). I'm not sure if I can find the time to fix everything right now.

niqdev commented 7 years ago

I'll keep one eye as well on this, thanks

niqdev commented 7 years ago

Actually there is this error:

So sorry! Free Learning is temporarily unavailable.
Our main server has fallen over and our backup server can't quite take the strain.
We've had to (temporarily) take down Free Learning until everything's fixed.
We'll have it back up as soon as possible!
Check on @packtpub for updates.

https://twitter.com/PacktPub/status/838780539020722176

niqdev commented 7 years ago

@juzim seems to work now, can you verify if the credential problem was related to the out of service? By the way the newsletter is not working now

[*] fetching url... 200 | https://www.packtpub.com/packt/free-ebook/what-you-need-know-about-python
[-] <type 'exceptions.IndexError'> list index out of range | spider.py@125
Traceback (most recent call last):
  File "script/spider.py", line 125, in main
    packtpub.runNewsletter(currentNewsletterUrl)
  File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 169, in runNewsletter
    self.__parseNewsletterBookInfo(soup)
  File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 101, in __parseNewsletterBookInfo
    urlWithTitle = div_target.select('div.promo-landing-book-picture a')[0]['href']
IndexError: list index out of range