rejoiceinhope / scrapy-proxy-pool

164 stars 33 forks source link

Response content isn't text #6

Open m1ngle opened 4 years ago

m1ngle commented 4 years ago

When I add PROXY_POOL_ENABLED = True and DOWNLOADER_MIDDLEWARES = {

...

'scrapy_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
'scrapy_proxy_pool.middlewares.BanDetectionMiddleware': 620,
# ...

} to my settings.py file I am encountering the following error:

AttributeError: Response content isn't text

I attempted this on the website I wanted to scrape, but also on the demo site http://quotes.toscrape.com/ I get the same error each time. I don't think I'm trying to scrape non-text content from this website. Have you ever encountered this?

MahmudulHassan5809 commented 4 years ago

I also face the same problem .Did you solved it ?

m1ngle commented 4 years ago

No, frustratingly I still get it. The project is on hold until I can get back to it.

tiansengkear commented 4 years ago

Disclaimer: I'm very new to Scrapy

I tried scraping Amazon.com (following a YouTube video) and I encountered the same error you guys faced. However when I tried it on Quotes to Scrape it's working fine. Could it be Amazon is the one trigger the error?

You might want to try it on another website to confirm this package is running well or giving the same error.

MahmudulHassan5809 commented 4 years ago

For me also it is working on others site perfectly but not working in amazon

m1ngle commented 4 years ago

I noticed Amazon now includes an epoch time code in the url of their pages indicating when the page was called. I'm not sure if this is related to the error.

Zuiluj commented 4 years ago

The latest commit that supposed to "upgrade" the policy.py is what causing the error. Some site works, some are not. As it is an intermittent error (or most likely something that I don't understand) reverting back to version 0.1.7 will make it work again as it does not validate the response.text https://github.com/hyan15/scrapy-proxy-pool/blob/master/scrapy_proxy_pool/policy.py#L15

For me, it worked. I figured that it returns the exception as the response is empty or something. But as much as I can deduce things, there's really nothing I can do so I figured I should just revert back versions. Hope it'll be fixed!

chotosite commented 3 years ago

I am also having the same issue. I am trying to scrape chotosite and it says "AttributeError: Response content isn't text" I have asked a detailed question in stackoverflow here