scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost #3103

Closed ghost closed 6 years ago

ghost commented 6 years ago

-- coding: utf-8 --

import scrapy

class FarmersSpider(scrapy.Spider): name = 'farmers' allowed_domains = ['www.farmerscompress.com'] login_url = 'https://www.farmerscompress.com/ProcessUser.aspx' start_urls = [login_url]

def parse(self, response):
    yield scrapy.FormRequest(url=self.login_url,
        formdata={'T1': 'user', 'T2': 'pass', 'B2': 'Login'},
        callback=self.after_login)

def after_login(self, response):
    tes = response.css('#Label1').extract()
    yield {'tes': 'tes', 'tes': tes}

2018-01-31 22:50:30 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2018-01-31 22:50:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.farmerscompress.com/robots.txt> (failed 1 times): [] 2018-01-31 22:50:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.farmerscompress.com/robots.txt> (failed 2 times): [] 2018-01-31 22:50:32 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.farmerscompress.com/robots.txt> (failed 3 times): [] 2018-01-31 22:50:32 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET https://www.farmerscompress.com/robots.txt>: [] ResponseNeverReceived: [] 2018-01-31 22:50:32 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.farmerscompress.com/ProcessUser.aspx> (failed 1 times): [] 2018-01-31 22:50:33 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.farmerscompress.com/ProcessUser.aspx> (failed 2 times): [] 2018-01-31 22:50:33 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.farmerscompress.com/ProcessUser.aspx> (failed 3 times): [] 2018-01-31 22:50:34 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.farmerscompress.com/ProcessUser.aspx>: [] 2018-01-31 22:50:34 [scrapy.core.engine] INFO: Closing spider (finished)

cathalgarvey commented 6 years ago

Seems to be same or similar cipher issue to #3065. The problem that the upstream libraries supplying us with encryption support for SSL and TLS are configured to only support modern, safe ciphers, but some servers are configured to use old, unsafe cipher suites.

In your case, you're 'lucky' insofar that the server will allow you to connect over bare http. So you can possibly sidestep this issue by just removing the 's' from your URLs: http://www.farmerscompress.com/ProcessUser.aspx seems to load in Scrapy shell, for me. However, the server throws a server-side error for me at that URL, in browser and in Scrapy.

I'll close this as it appears to duplicate another issue, but feel free to chip in there or continue discussion here if you think there's more to this. Thanks!

MaximRudenko commented 5 years ago

In my case I want to use scrapy + splash + proxy. I can get index.html from the website, but I can't get everything else :( (and I need everything else for splash to work) I'm at a dead end.

niquepa commented 5 years ago

You need to set a user-agent string. It seems some websites don't like it and block when your user agent is not a browser.

Open settings.py: Add the following user agent

USER_AGENT = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36'

HPudge commented 5 years ago

I add the UA in my setting,but it can't work yet. Then i use random UA but it dosen't work too. So sad

zhengzhen0512 commented 5 years ago

I add the UA in my setting,but it can't work yet. Then i use random UA but it dosen't work too. So sad

Me too,Have you solved it now?

zhangshouye0505 commented 4 years ago

yes, I also have the same problem

aerilxx commented 4 years ago

I cannot even open the site in scrapy shell, i have changed user-agent but still no luck, tried to ues the a virtualenv with 'cryptography<2' (suggesting here https://github.com/scrapy/scrapy/issues/2311#issuecomment-325804964) no luck either. anybody fix this problem yet?

Ostapp commented 4 years ago

have same issue here, proposed solutions do not work

vkamma commented 4 years ago

Have same issue. Any solutions?

guntutur commented 4 years ago

Have same issue. how to know which chiper/ssl being used by our machine? i tried to run scrapy from my macbook, it's doing fine, but running from hosted vm getting this non-clean fashion error

SachitNayak commented 4 years ago

I cannot even open the site in scrapy shell, i have changed user-agent but still no luck, tried to ues the a virtualenv with 'cryptography<2' (suggesting here #2311 (comment)) no luck either. anybody fix this problem yet?

I'm facing the same issue - tried removing 's' in 'https' to switch to normal 'http'; no luck! Any solutions?

SachitNayak commented 4 years ago

I cannot even open the site in scrapy shell, i have changed user-agent but still no luck, tried to ues the a virtualenv with 'cryptography<2' (suggesting here #2311 (comment)) no luck either. anybody fix this problem yet?

I'm facing the same issue - tried removing 's' in 'https' to switch to normal 'http'; no luck! Any solutions?

Solved by changing user agent. Also I did not downgrade cryptography to <2. Works well without doing that. See https://github.com/scrapy/scrapy/issues/2916#issuecomment-589357121

nciefeiniu commented 4 years ago

Have same issue. how to know which chiper/ssl being used by our machine? i tried to run scrapy from my macbook, it's doing fine, but running from hosted vm getting this non-clean fashion error

I met the same problem

ozansumen1 commented 3 years ago

I have the same problem. Any suggestions?

maazullah96 commented 3 years ago

I also met the same problem

mujahidalkausari commented 3 years ago

I'm facing the same irritating issue.

shartoo commented 3 years ago

There is no solution?Come on,be a powerful man!

legion-support commented 3 years ago

Same problem

kalpeshtawde commented 3 years ago

Same problem, no luck yet.

Shuaiwei-dash commented 2 years ago

you can try to not add headers in yield scrapy.Request() params.

angelbi commented 2 years ago

solution: meta = {'proxy': 'http://127.0.0.1:1080'} yield scrapy.Request(url=urls[0], callback=self.parse, meta=meta)

triangle959 commented 2 years ago

After modifying the user agent and switching proxies, this error still pops up, but not very often

anvaari commented 2 years ago

Same problem with OpenSSL 1.1.1 11 Sep 2018 in Ubuntu 18.0.4 but it work fine with OpenSSL 1.1.1m 14 Dec 2021 in Ubuntu 20.0.4

triangle959 commented 2 years ago

已收到

MaJuks commented 2 weeks ago

I fixed this issue by removing the response from the yield statement like this:

before yield scrapy.Request(url=url, callback=some_function)

And then using the requests library directly:

r = requests.get(url)
response = TextResponse(r.url, body=r.text, encoding="utf-8")

you are using formRequest, you can test it like this:

r = requests.request("GET", url, headers=headers, data=payload) # or body=payload
response = TextResponse(r.url, body=r.text, headers=headers,encoding='utf-8')

The problem with this solution is that you can't pass the response and use yield, but this solved my problem