Open Gallaecio opened 4 years ago
I'd like to take this, and as this is going to take some time, I'm open to pair with anyone else that wants to work on this
I'd like to take this, and as this is going to take some time, I'm open to pair with anyone else that wants to work on this
Hey I would like to work on this issue. So is there any novice guide for beginners.
I guess the first steps, in addition to getting familiar with Scrapy, would be to learn how to extend Jupyter Notebook, so that the proof-of-concept code from http://nbviewer.ipython.org/gist/kmike/9001574 makes some sense to you.
I guess the first steps, in addition to getting familiar with Scrapy, would be to learn how to extend Jupyter Notebook, so that the proof-of-concept code from http://nbviewer.ipython.org/gist/kmike/9001574 makes some sense to you.
Hey i have been trying to run this notebook on binder as well as on my system but cant get past through errors.
On Binder the error is as follows :
On my system the error is as follows : On my system I am unable to import any of the modules from scrapy.Any help would be much appreciated thanks in advance.
@BisariaUtkarsh would you like to work on this together ?
@BisariaUtkarsh In “Binder” you seem to be missing lxml. I am not familiar with Binder, so I cannot tell you how, but you need to install lxml there. Lxml requires some C++ packages, so it may not be trivial to install. See https://lxml.de/installation.html
On your system, you are simply suffering the effects of Scrapy having evolved since that proof-of-concept code was initially written. scrapy.project
was removed in Scrapy 1.6.0 (see https://docs.scrapy.org/en/latest/news.html). In the case of project
, I don’t see it being used in that code, so you can probably just remove project
from the imports. But if you run into similar issues with code that is actually used, you might need to check the release notes I’ve just linked and other parts of the Scrapy documentation to find a replacement.
@BisariaUtkarsh would you like to work on this together ?
@joybhallaa I m not sure if there can be two applicants to a GSOC idea. We may have to confirm this with the concerned mentor I guess so. @Gallaecio can we do this??
@BisariaUtkarsh In “Binder” you seem to be missing lxml. I am not familiar with Binder, so I cannot tell you how, but you need to install lxml there. Lxml requires some C++ packages, so it may not be trivial to install. See https://lxml.de/installation.html
On your system, you are simply suffering the effects of Scrapy having evolved since that proof-of-concept code was initially written.
scrapy.project
was removed in Scrapy 1.6.0 (see https://docs.scrapy.org/en/latest/news.html). In the case ofproject
, I don’t see it being used in that code, so you can probably just removeproject
from the imports. But if you run into similar issues with code that is actually used, you might need to check the release notes I’ve just linked and other parts of the Scrapy documentation to find a replacement.
Hey @Gallaecio thanks for sharing the documentation it really helped a lot to get over some other issues. However I m stuck with "Broken Pipe Error" and couldn't find a work around on stackoverflow as well. Any suggestions to tackle this...
I don’t think 2 students can work on the same idea for GSoC. I don’t know if @joybhallaa is planning to join GSoC this year, though.
@BisariaUtkarsh regarding your current error, it is hard to tell where it comes from, since your screenshot does not contain the whole traceback. Could you share the whole traceback as text?
Moreover, unless you share your changes that you made to https://nbviewer.jupyter.org/gist/kmike/9001574 to make it work with the latest version of Scrapy, it could take me a while to figure out those changes myself in order to try and reproduce your issue.
I made several changes as per the documentation such as :
Removed project
scapy.spider to scrapy.spiders
BaseSpider to Spider
scrapy.xlib.pydispatch to pydispatcher
Queue() was replaced by multiprocessing.Queue()
HtmlXPathSelector was replaced by Selector
Here is the link to my notebook : https://github.com/BisariaUtkarsh/test_scrapy/blob/master/ipython-scrapy.ipynb
Error :
BrokenPipeError Traceback (most recent call last)
Have you tried searching the internet for both the exception class and the class raising it? “BrokenPipeError ForkingPickler”.
You might also want to try to run the code on as a regular Python script in your system, to see if the issue can be reproduced that way as well, or is specific to Jupyter Notebook.
@BisariaUtkarsh I inquired first to pick up this issue, I wanted someone to give me the nod as this was a gsoc issue and yes I am planning to join GSoC this year.
@BisariaUtkarsh at least reply.
@joybhallaa Hey rn i m not sure if i will go forward with this issue so you may carry on with it.
Hey, @joybhallaa would you like to work with this issue? I would like to be of help.
@never2average I'm going to work on this issue, and as @Gallaecio said,2 people can't work on this issue.
Like what changes have you made, I would like to offer some suggestions though?
@never2average currently setting up development environment on my machine as @BisariaUtkarsh told me that he will not be working on this issue anymore.
@never2average I will be open to suggestions when someone gives me the approval, I prefer getting approval from members of an organizations as they are far more experienced and can let me know if my ideas are efficient or not, and can help the project grow.
@never2average I'm going to work on this issue, and as @Gallaecio said,2 people can't work on this issue.
2 people cannot be selected for the same idea, but multiple students may submit proposals for the same idea. Anyone should feel free to work on a proposal for this or any other idea, regardless of other candidate students.
@joybhallaa I believe you are going in the right direction, yes :slightly_smiling_face:
@Gallaecio :+1:
@Gallaecio what deliverables would you like to see in a proposal and what features are a must have? I am going to submit my first proposal today, would really like some input.
hey @Gallaecio @wRAR , whenever you're free, please take a look at my draft proposal. I will appreciate it
@joybhallaa I’ve had a look at your proposal.
I see no mention of Twisted in your proposal. However, it was my impression that Scrapy being based on Twisted, and hence using a (non-restartable) Twisted reactor as an event loop, was one of the main issues that you face when using Scrapy within Jupyter Notebook. See this old proposal I’ve just found in the internet. Doesn’t that issue still exist? Will your proposal include work towards solving or easing that issue somehow?
Also, @BisariaUtkarsh had quite some trouble working through that old snippet, http://nbviewer.ipython.org/gist/kmike/9001574. Did you have better luck?
@Gallaecio Thanks for taking a look at my proposal.
To be honest, I was not aware of the limitation of Twisted
, I am looking it up now.
Also, @BisariaUtkarsh had quite some trouble working through that old snippet, http://nbviewer.ipython.org/gist/kmike/9001574. Did you have better luck?
I was able to run this old snippet on Google Collab
with no problems.
@Gallaecio I have a question regarding twisted:
@Gallaecio I've updated my proposal with proposed changes, feel free to look at it now :) Let me know if you have any questions. EDIT: I've uploaded my final proposal, you can also check that out and suggest changes/improvements.
See http://gsoc2015.scrapinghub.com/ideas/#iphyton-ide for more information.