scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

Python 3.7 support #3143

Closed lopuhin closed 5 years ago

lopuhin commented 6 years ago

The goal is to add python 3.7 to travis and pass all tests, the first beta was already released at the end of January.

lopuhin commented 6 years ago

One issue was reported at https://stackoverflow.com/questions/48861287/why-am-i-getting-this-error-in-scrapy-python3-7-invalid-syntax

2018-02-26 19:11:02 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-02-26 19:11:02 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.3, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.9.0, Python 3.7.0b1 (default, Feb 26 2018, 19:04:22) - [GCC 5.4.0 20160609], pyOpenSSL 17.5.0 (OpenSSL 1.0.2g  1 Mar 2016), cryptography 2.1.4, Platform Linux-4.4.0-116-generic-x86_64-with-debian-stretch-sid
2018-02-26 19:11:02 [scrapy.crawler] INFO: Overridden settings: {'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'EDITOR': 'vim', 'LOGSTATS_INTERVAL': 0, 'TELNETCONSOLE_ENABLED': '0'}
Traceback (most recent call last):
  File "/home/kostia/tmp/py37/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
    func(*a, **kw)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
    cmd.run(args, opts)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/commands/shell.py", line 65, in run
    crawler = self.crawler_process._create_crawler(spidercls)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler
    return Crawler(spidercls, self.settings)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in __init__
    self.extensions = ExtensionManager.from_crawler(self)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/home/kostia/.pyenv/versions/3.7.0b1/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 723, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
    from twisted.conch import manhole, telnet
  File "/home/kostia/tmp/py37/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
    def write(self, data, async=False):
                              ^
SyntaxError: invalid syntax

This also happens if -s TELNETCONSOLE_ENABLED=0 is passed, because the module is still imported.

Twisted issue: https://twistedmatrix.com/trac/ticket/9384#ticket and PR https://github.com/twisted/twisted/pull/966/

patiences commented 6 years ago

Is this an issue that needs someone to work on? I would love to give it a shot. :-) Looks like the PR to fix the above issue should be merged soon.

lopuhin commented 6 years ago

@patiences yes, I'm not aware of anyone working on it, so it would be awesome if you give it a shot! And feel free to submit a work in progress PR. I think first step would be adding python 3.7 to tox and travis. There might be other issues besides this syntax error in twisted.

patiences commented 6 years ago

@lopuhin Great!! The WIP PR is here: https://github.com/scrapy/scrapy/pull/3150 :-)

qianyinghuanmie commented 5 years ago

I once thought it supported Python3.7,in other words, was it support Python3.6-?

grammy-jiang commented 5 years ago

@qianyinghuanmie

Scrapy supports Python 3.6, and Travis has related tests.

illgitthat commented 5 years ago

It seems like this issue is dependent on https://github.com/twisted/twisted/pull/966 to be merged in first.

jstnms123 commented 5 years ago

I have reopened this issue. The problem persists. <<<<<<< jA$ scrapy shell 2018-07-08 14:42:44 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot) 2018-07-08 14:42:44 [scrapy.utils.log] INFO: Versions: lxml 4.2.3.0, libxml2 2.9.2, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 18.4.0, Python 3.7.0 (default, Jun 29 2018, 20:13:53) - [Clang 8.0.0 (clang-800.0.42.1)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.2.2, Platform Darwin-15.6.0-x86_64-i386-64bit 2018-07-08 14:42:44 [scrapy.crawler] INFO: Overridden settings: {'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'LOGSTATS_INTERVAL': 0} Traceback (most recent call last): File "/usr/local/bin/scrapy", line 11, in sys.exit(execute()) File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help func(*a, **kw) File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command cmd.run(args, opts) File "/usr/local/lib/python3.7/site-packages/scrapy/commands/shell.py", line 65, in run crawler = self.crawler_process._create_crawler(spidercls) File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler return Crawler(spidercls, self.settings) File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in init self.extensions = ExtensionManager.from_crawler(self) File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings mwcls = load_object(clspath) File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object mod = import_module(module) File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in from twisted.conch import manhole, telnet File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154 def write(self, data, async=False): ^ SyntaxError: invalid syntax

illgitthat commented 5 years ago

@jstnms123 I don't think this issue was ever closed, just the referenced issue since it was a duplicate.

@lopuhin do you know of a current workaround to install the branch you created in twisted so scrapy can run with 3.7? (https://github.com/twisted/twisted/pull/966)

lopuhin commented 5 years ago

@illgitthat you can install the branch with pip install git+https://github.com/lopuhin/twisted.git@9384-remove-async-param. I hope we'll be able to also provide a work-around in 1.6, and then will hopefully finish the twisted fix (it affects modules that are very rarely used, but scrapy imports then unconditionally).

illgitthat commented 5 years ago

Thank you @lopuhin! I googled around but I must have been just off on the syntax.

jstnms123 commented 5 years ago

@lopuhin -- I used the work around to appropriate effect. Thanks for the info.

jA

threezhang commented 5 years ago

thanks @lopuhin

pip install git+https://github.com/lopuhin/twisted.git@9384-remove-async-param

illgitthat commented 5 years ago

pip install git+https://github.com/lopuhin/twisted.git@9384-remove-async-param fails for me on Windows but worked on Linux. I think this is related to a twisted issue not scrapy, just leaving this here in case anyone else on Windows is stuck trying to run scrapy on 3.7.

lopuhin commented 5 years ago

FWIW, thanks to awesome Twisted maintainers, the fix in https://github.com/twisted/twisted/pull/966 was finished and merged into Twisted, so it should be in the next Twisted release.

Also a workaround for this issue was merged into scrapy master (and will be in the 1.6 release), so another way to enable python 3.7 support right now, is to install scrapy master and use Twisted from PyPI.

illgitthat commented 5 years ago

Using pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade is working for me.

Thanks for the help.

johntiger1 commented 5 years ago

Yup @illgitthat works for me; I think @lopuhin branch might be merged in now, so it's gone

ghost commented 5 years ago

Ummm just replace the word async with isAsync the other is deprecated....

kmike commented 5 years ago

@joshspivey async keyword is used in Twisted, not in Scrapy. @lopuhin worked with Twisted maintainers to fix it in Twisted, so Scrapy will work with Python 3.7 after Twisted release a new version with a fix. Also, we've worked around it in Scrapy itself, so that Scrapy works with the current Twisted release (disabling manhole), this will be available in a next Scrapy release. "Ummm just replace" is not how to fix this issue; such comments are not helpful.

wertartem commented 5 years ago

Go to Python37\Lib\site-packages\twisted\conch edit "manhole" file and replace the 'async' parameter by 'isAsync'.

georgiana-gligor commented 5 years ago

Because I was just trying this out in a new project, @illgitthat 's solution didn't work for me. I had to follow all dependencies, so pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade did the trick.

cgironda commented 5 years ago

Thank you to all of you guys, the command pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade worked for me too.

clockelliptic commented 5 years ago

This also worked for me. pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade

Thank you very much.

ciehanski commented 5 years ago

Also adding in that the current fix for this is running pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade

Thank you everyone!

AsthaSrivastava8 commented 5 years ago
2018-10-10 01:56:15 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-10-10 01:56:15 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i  14 Aug 2018), cryptography 2.3.1, Platform Windows-10-10.0.17134-SP0
2018-10-10 01:56:15 [scrapy.crawler] INFO: Overridden settings: {'SPIDER_LOADER_WARN_ONLY': True}
2018-10-10 01:56:15 [scrapy.middleware] WARNING: Disabled TelnetConsole: TELNETCONSOLE_ENABLED setting is True but required twisted modules failed to import:
Traceback (most recent call last):
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\extensions\telnet.py", line 13, in <module>
    from twisted.conch import manhole, telnet
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\twisted\conch\manhole.py", line 154
    def write(self, data, async=False):
                              ^
SyntaxError: invalid syntax

2018-10-10 01:56:16 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2018-10-10 01:56:16 [twisted] CRITICAL: Unhandled error in Deferred:

2018-10-10 01:56:16 [twisted] CRITICAL:
Traceback (most recent call last):
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\crawler.py", line 80, in crawl
    self.engine = self._create_engine()
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\crawler.py", line 105, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\core\downloader\__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
    mod = import_module(module)
  File "c:\users\astha\appdata\local\programs\python\python37\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\scrapy\downloadermiddlewares\retry.py", line 20, in <module>
    from twisted.web.client import ResponseFailed
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\twisted\web\client.py", line 41, in <module>
    from twisted.internet.endpoints import HostnameEndpoint, wrapClientTLS
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\twisted\internet\endpoints.py", line 41, in <module>
    from twisted.internet.stdio import StandardIO, PipeAddress
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\twisted\internet\stdio.py", line 30, in <module>
    from twisted.internet import _win32stdio
  File "c:\users\astha\appdata\local\programs\python\python37\lib\site-packages\twisted\internet\_win32stdio.py", line 9, in <module>
    import win32api
ModuleNotFoundError: No module named 'win32api'

Getting error on crawl

rogerlee6411 commented 5 years ago

same issue as AsthaSrivastava8 post. any solution now ?

appcypher commented 5 years ago

Installed scrapy via pipenv and it installed successfully but whenever I try the scrapy crawl command I get a similar error as the one above.

Python = 3.7.0 Pipenv = 2018.7.1 Scrapy = 1.5.1

Is there a temporary workaround?

ghost commented 5 years ago

@appcypher As far as I understood the developers: Install from the repository directly, because it seems they implemented a workaround which was not released, yet.

See above: https://github.com/scrapy/scrapy/issues/3143#issuecomment-422990661

wmorgue commented 5 years ago

@appcypher via pipenv:

pipenv install scrapy==1.5.1
pipenv shell
pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade
xanderwang commented 5 years ago
pip install git+https://github.com/scrapy/scrapy@master --no-dependencies --upgrade

worked for me.

kennblvnp commented 5 years ago

just use python 2.7, problem solved

ruchi3086 commented 4 years ago

Hi Lopuhin, pip install git+https://github.com/lopuhin/twisted.git@9384-remove-async-param

it gives me path error, what i do?

lopuhin commented 4 years ago

hi @ruchi3086 since this was merged and released, now you can do pip install scrapy or pip3 install scrapy. If this still gives you an error, please paste it here.

ruchi3086 commented 4 years ago

Hi There,

Great to have the immediate reply, which I was actually looking for.... The primary problem is:

[image: image.png]

How may I solve this please??

Thanks Ruchi

On Wed, Sep 25, 2019 at 9:32 AM Konstantin Lopuhin notifications@github.com wrote:

hi @ruchi3086 https://github.com/ruchi3086 since this was merged and released, now you can do pip install scrapy or pip3 install scrapy. If this still gives you an error, please paste it here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scrapy/scrapy/issues/3143?email_source=notifications&email_token=AK6SYFG64EUKS2IMHFLCKN3QLMH2LA5CNFSM4ESOCKG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7Q5BOY#issuecomment-534892731, or mute the thread https://github.com/notifications/unsubscribe-auth/AK6SYFDUUX42YVUDFCY6LP3QLMH2LANCNFSM4ESOCKGQ .

--

Best regards, Ruchi Gupta

ruchi3086 commented 4 years ago

Traceback (most recent call last): File "C:/Users/rugupta/Documents/Project SL/import scrapy error.py", line 1, in import scrapy File "C:/Users/rugupta/Documents/Project SL\scrapy.py", line 5, in from scrapy.crawler import CrawlerProcess ModuleNotFoundError: No module named 'scrapy.crawler'; 'scrapy' is not a package

lopuhin commented 4 years ago

@ruchi3086 most likely you need to rename scrapy.py local file to some other name so that it does not conflict with scrapy package.

Note though that this is no longer related to python 3.7 support and most likely not a scrapy bug, please, use StackOverflow to ask this type of questions. See Getting Help.

BHouwens commented 4 years ago

I'm still having the original issue. None of the proposed workarounds seem to make any difference.

Gallaecio commented 4 years ago

The original issue is Python 3.7 support. If you have a different issue, and you believe it is a Scrapy bug rather than an error in your end, please open a separate issue.

kallesamuelsson commented 4 years ago

The original issue is Python 3.7 support. If you have a different issue, and you believe it is a Scrapy bug rather than an error in your end, please open a separate issue.

I deleted my comment because I realized my misstake. Apologies for this