Closed extesy closed 8 years ago
Scrapy uses Twisted in its core, so support python 3 at least depended on Twisted python 3 support. Twisted development team has a project to port Twisted on python 3 and it is in progress, so I think as soon as Twisted is ported to python 3 Scrapy will get good chances to be ported as well.
mark
we are waiting for http://www.python.org/dev/peps/pep-3156/
for python3 I am developing
https://bitbucket.org/estin/pomp
like scrapy but very small, unstable and without hard twisted dependency
mark' the latest development branch 0.17 did not support py3
@nramirezuy there's a reference implementation for pep 3156 here: https://code.google.com/p/tulip/
Is there a list of what parts of Twisted are used? Twisted have a python3 migration plan here: http://twistedmatrix.com/trac/wiki/Plan/Python3 It might be worthwhile to investigate whether the used parts of Twisted are already ported.
Can scrapy not be made to work with python 3, now that asyncio is available?
+1 for Python 3.4 support. After a year using Python 3 (mainly sklearn, numpy, Anaconda, matplotlib, networkx etc) this is the first blocker I've had forcing me to downgrade.
The only other Python2.7-only project that I'm lightly using is Apache Spark and 3.4+ support is scheduled for their next release. In their issue tracker I posted some stats for Python 3 adoption - roughly speaking it is ">40%" (accepting the self-selected group of survey participants): https://issues.apache.org/jira/browse/SPARK-4897?focusedCommentId=14303154&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14303154
@ianozsvald we are working on it, it is a priority :)
Scrapy is the worst kind of project to port to Python 3 - it depends on Twisted (which is not ported to Python 3 yet - some subset of Twisted works though), and it works at outside world / python world boundary, so there are many questions about unicode. "Outer World" Scrapy works with is wild - there is no a well-defined encoding we can decode/encode data from/to. Encoding rules are sometimes crazy - e.g. browsers (which Scrapy aims to emulate) can use different charsets for different parts of a single URL, e.g. cp1251 for /path and utf-8 for GET parameter values. I've ported a lot of code to Python 3 (including most of NLTK and tens of other Python packages), but still getting porting details wrong for Scrapy (e.g. https://github.com/scrapy/scrapy/pull/837 is wrong).
Some parts of Scrapy are already ported to Python 3. We're running tests for Python 3.3 on Travis to prevent regressions; ~240 tests pass in 3.3, out of ~1000. There is a GSoC project to port Scrapy to Python 3.x; I think we should make a good progress this summer.
There is also https://github.com/mitmproxy/mitmproxy Scrapy dependency which doesn't have Python 3 support yet, but it is used only in tests.
@kmike Hey Mikhail! You are a man of many projects :-) Glad to hear it is being worked on, I didn't get that impression from the early parts of this thread and couldn't see any other porting docs. I quite agree that this project (just like Flask et al.) is going to be hard, dealing with the interface to the outside world is horrid. I certainly didn't know that URLs themselves could have mixed encodings :-( Given the continual migration to Python 3 for personal projects (50/50 according to the survey I linked vs Python 2.7) and >40% for work, the need for scrapy's Py3 support is only going to get stronger. Bon chance!
+1 for Python 3 support! Thanks for the hard work you guys are putting into it, hope GSoC goes well.
:+1: as well, would really love to be able to use python 3 with scrapy! And many thanks your effort!
You can use my patches with ported twisted.web.client.Agent and friends from my fork.
Are there still outside blockers for porting to python3? (twisted libs, etc.?) Would love to see a list of those, if one has been made.
Also, in the name of eventual portability (e.g. asyncio?) how do people feel about dropping dependencies on twisted for the web/downloader part? I recall there was a gsoc idea for this? Would be interesting to see if a downloader using pycurl bindings might work with twisted here. (Though pycurl has no cffi bindings at this time, so no pypy support.)
There's a comprehensive status of the twisted dependencies in Berker's proposal. @berkerpeksag, would you mind if we put it up on our wiki for reference?
Sure, but that list is a bit outdated. For example, twisted.web.static
has already been ported to Python 3. You may want to check twisted/python/dist3.py
first.
Will do, thanks!
Here's the updated list: https://github.com/scrapy/scrapy/wiki/PY3%3A-Twisted-Dependencies.
Thank you both!
That's great news, thanks for reporting! I just updated the wiki.
Hello all. I've got a Lightning Talk on Python3.5 at my next PyDataLondon meet (200+ data scientists in the room). Someone is bound to ask about scrapy/twisted on Python 3.4+, could someone comment on the current state? It isn't clear to me from the links above if enough of twisted
has been ported for scrapy
to run on Python 3 (or will soon)?
Hi @ianozsvald, glad to hear about the interest in python3 support!
Currently scrapy doesn't run in python3, not even a meaningful subset of it, but twisted support isn't the only issue holding us back. Most of the twisted modules used in scrapy are already ported, and in some cases the features that use them could be deactivated, like telnet or mail (well, extensions that use mail could be deactivated or changed to not use mail in python3 for instance). twisted.web.client.Agent
is a problem anyhow, but this can be patched in our side.
We stopped the python3 integration for some time because we couldn't agree on the type we should use to represent urls but thankfully that matter was resolved, though it hasn't been coded yet.
So, there aren't any big stoppers (not that I know of), just the time to get around it.
We haven't defined a deadline yet but it's something we want to see before the end of the year. On top of that, this weekend some scrapinghubbers will hold a sprint to accelerate the support, so maybe there'll be news sooner than expected :wink:
Hi @curita, thanks for the note. For my data science audience I think scrapy is the only non-python-3.4 package that matters, everything else that they (and I) use is already running with Python 3.4. I wish you all luck in the conversion, knowing the data science stack is almost fully 3.4 compliant really helps when planning larger-scale projects.
Is there any plan to replace mitmproxy
requirement for tests?
@darkrho I don't know; I was thinking about porting it, not replacing. Are there alternatives?
@ianozsvald I know your pain; Scrapy is the only reason I'm using Python 2 now :) At EuroPython me and @dangra tried to unblock the further porting - the bottleneck was in Request and Response objects, and they are ported now in https://github.com/scrapy/scrapy/pull/1384. It is still a long road to full Python 3 support, but we're in a much better shape now - 480 507 tests are passing in Python 3, compared to 248 before the sprint. Working Request and Response objects open a gate for other's contributions, so I expect Python 3 Scrapy support to get more love soon.
@kmike hey, that's lovely to hear (and Graham Markall [of Continuum] told me about the sprint), we'll certainly note this when we talk next week. Cheers!
For anyone interested in contributing I've created a wiki page (https://github.com/scrapy/scrapy/wiki/Python-3-Porting) with some information & guidelines.
apparantly, twisted for python 3 is out ... https://twitter.com/hawkieowl/status/670885245328166912
I don't understand, I suppose examples are not rewritten for Python3:
import scrapy
class MySpider(scrapy.Spider):
name = 'example.com'
allowed_domains = ['example.com']
start_urls = [
'http://www.example.com/1.html',
'http://www.example.com/2.html',
'http://www.example.com/3.html',
]
def parse(self, response):
self.logger.info('A response from %s just arrived!', response.url)
with this error
>>> ================================ RESTART ================================
>>>
Traceback (most recent call last):
File "D:/WinPython/basedir34/buildBarebone/winpython-3.4.3/notebooks/scrapy.py", line 1, in <module>
import scrapy
File "D:/WinPython/basedir34/buildBarebone/winpython-3.4.3/notebooks\scrapy.py", line 4, in <module>
class MySpider(scrapy.Spider):
AttributeError: 'module' object has no attribute 'Spider'
>>>
Any idea how it should be written in Python 3 ?
Hey @stonebig,
There are Scrapy parts which work in Python 3, but Scrapy as a framework is not usable for end users in Python 3 yet. Please wait or help us :)
Apart from that, Spyder
has nothing to do with Scrapy, and you're trying to import from your scrapy.py module, not from scrapy. There are other channels to get support - we're using http://stackoverflow.com (ask a question with Scrapy tag); there is also scrapy-users google group.
ok. I'll go to the user group. Sorry for the noise.
and updated in the wiki
Basic support is planned for v1.1 And we plan to make it more robust for v1.2
:+1:
Great ! Is there a document that estimates the rough timeline of these two milestones ? spring 2016 and summer 2016 ?
@stonebig , we plan on releasing Scrapy 1.1 officially by the end of February 2016 (with a candidate release at least in the next few days) Scrapy 1.2 would be a couple of months after that (we hope)
thanks a lot for this information, @redapple !
It has gone and past six days, @redapple!
:)
@KeremTubluk , we're not quite there yet: https://github.com/scrapy/scrapy/milestones/v1.1
aand its official now. http://doc.scrapy.org/en/stable/news.html#id1
it seems that the twisted already supports py3.3+
it seems that the twisted already supports py3.3+
@ABSmiLT Yeah, AFAICT Twisted only recently supported 3 well enough for Scrapy. Hence all the discussion above and in the docs.
@ABSmiLT we've released scrapy 1.1rc1 with alpha-level Python 3 support about a month ago. 1.1rc2 will be released soon; it fixes several Python 3 compatibility issues we've found while testing 1.1rc1.
thanks for informing, @kmike @d0ugal looking forward to the new stable version compatible with py3
Python 3 is several years old and most of packages now support it (even django!). It would be really nice to support it in scrapy as well.