tether / roach

A very adaptable web crawler framework. Impossible to kill.
Other
7 stars 1 forks source link

Crawlers should support parsing XML #10

Closed ekryski closed 10 years ago

bredele commented 10 years ago

I found that module to stream xml (https://github.com/assistunion/xml-stream) but I'm not really convinced. What do you think?

The crawler API is entirely based on stream and we could have 'static' utils (json, xml, etc) in utils.

bredele commented 10 years ago

@substack is a genius! I tested the html parser with xml parser and it's working. The main benefit is we can use query selection on html.

ekryski commented 10 years ago

Ya I think so. I'm really liking how this stream stuff is coming together. Nice work! I think that is totally the way to do it. That's what gulpjs is doing and they have this virtual file that they are passing around that is a special translate stream. We might want to look at that in the future.