scrapy / scrapely

A pure-python HTML screen-scraping library
1.86k stars 315 forks source link

How to use use html data instead of direct URLs #39

Closed mejo closed 10 years ago

mejo commented 11 years ago

Older issue mentions 'train_from_htmlpage' method but its not working anymore? What I try to do is provide preprocessed html data (utf8 conversion done to make scrapely work) for scrapely.

tpeng commented 10 years ago

I think train_from_htmlpage should work. could you check if you pass HtmlPage or raw data? BTW you can convert the raw html data to HtmlPage simply with HtmlPage(body=raw_body)

pablohoffman commented 10 years ago

@mejo did you manage to solve your issue with @tpeng suggestion?. Can you close this ticket if so, thanks.

mejo commented 10 years ago

Did not verify, pursuing other projects. Closing anyway.