scrapy / scrapely

A pure-python HTML screen-scraping library
1.86k stars 273 forks source link

how to use to_file method #2

Closed goldalworming closed 13 years ago

goldalworming commented 13 years ago

I test scrapely with your example...but I don't know how to store templates to file (or database)... I tried

from scrapely import Scraper s = Scraper() url1 = 'http://pypi.python.org/pypi/w3lib' data = {'name': 'w3lib 1.0', 'author': 'Scrapy project', 'description': 'Library of web-related functions'} s.train(url1, data)

s.tofile('testemplatefile') Traceback (most recent call last): File "", line 1, in File "scrapely/init.py", line 28, in tofile json.dump({'templates': tpls}, file) File "/usr/lib/python2.7/json/init.py", line 182, in dump fp.write(chunk) AttributeError: 'str' object has no attribute 'write'

so I test

s = Scraper('abc.json') url1 = 'http://pypi.python.org/pypi/w3lib' data = {'name': 'w3lib 1.0', 'author': 'Scrapy project', 'description': 'Library of web-related functions'} s.train(url1, data) Traceback (most recent call last): File "", line 1, in File "scrapely/init.py", line 41, in train self.templates.append(tm.get_template()) AttributeError: 'str' object has no attribute 'append' s.tofile(url1) Traceback (most recent call last): File "", line 1, in File "scrapely/init.py", line 27, in tofile tpls = [page_to_dict(x) for x in self.templates] File "scrapely/htmlpage.py", line 32, in page_to_dict 'url': page.url,

what should I do to store template to file (or database) then use it again?? maybe redis is my database choice...

pablohoffman commented 13 years ago

tofile() receives a file-like object, not a str, so you can do:

with open('templates.json', 'w') as f:
    s.tofile(f)

Also, Scraper constructor receives a list of templates, so you should use the alternative fromfile() constructor when loading from a file:

with open('templates.json') as f:
    s = Scraper.fromfile(f)