soumalyaon6 / spynner

Automatically exported from code.google.com/p/spynner
GNU General Public License v3.0
0 stars 0 forks source link

Download function cannot continue to perform #28

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
after downloading a file, the program will exit and not continue, unless 
re-creating a new object. 

What version of the product are you using? On what operating system?
Master version. windows XP

Please provide any additional information below.
after downloading a file, the program will exit and not continue. Following is 
the my code:

#coding=utf8
import pyquery 
import spynner

browser = spynner.Browser()
#browser.debug_level = spynner.DEBUG
browser.load("http://www.meinv86.com/meinv/xiaoyuanmeinvtupian/list_7_3.html")
#browser.wait_load()

d=pyquery.PyQuery(browser.html)
count=0
for i in d('img'):
 count=count+1
 print i.attrib["src"]
 try:
   browser.download(i.attrib["src"],open('head/'+str(count)+'.jpg',"wb"))
   browser = spynner.Browser()  # if don't creat a new object, the download will no longer continue!
 except:
   print "fail to download.."
   continue

Original issue reported on code.google.com by jackhome...@sina.com on 9 Apr 2011 at 7:41

GoogleCodeExporter commented 8 years ago
I can reproduce, when I try the script I get a segmentation fault on the second 
image. I was able to temporally fix it commenting this line in browser.py: 
"manager.setCookieJar(self.manager.cookieJar())". I am not sure if I am doing 
something wrong or it's simply a pyqt bug.

However, I must say: spynner is not the right tool for the task you are doing. 
Spynner *may* be useful for AJAX-intensive sites, but for a simple scrapping 
like it's overkill. I recommend urllib2 + pyquery, you don't need Javascript 
processing, do you?

import urllib2
html = 
urllib2.urlopen("http://www.meinv86.com/meinv/xiaoyuanmeinvtupian/list_7_3.html"
).read()
dom = pyquery.PyQuery(html)

and so on.

Original comment by tokland on 9 Apr 2011 at 9:29

GoogleCodeExporter commented 8 years ago
Thanks Arnau. For this example, using spynner might be overkill, but for most 
cases, I need to login to certain js-backed site to download pictures, musics 
and etc. and spynner seems to be a must. and I notice that spynner is very good 
at downloading large size files. Can you quickly take a look to make the 
download function continue to work, or you have any good suggestions to combine 
urllib2 with spynner for the downloading jobs. Recently I am studying about 
Gevent, don't know if you have any ideas about integrate spynner into a 
multi-threading model. Thanks

Original comment by jackhome...@sina.com on 10 Apr 2011 at 5:40

GoogleCodeExporter commented 8 years ago
If you have authenticated sites, I'd still use urllib2, yet with cookies. 
Indeed, it's more work because you have to figure out how to authenticate and 
use cookies, but on the long run the script is more robust.

Original comment by tokland on 10 Apr 2011 at 8:30

GoogleCodeExporter commented 8 years ago
Hi tokland, I seem to having a possibly related problem with using 
Browser.download(). It does not seem to be passing the cookies properly.

Original comment by hari...@gmail.com on 13 Jul 2011 at 2:57