tony-o / perl6-html-parser-xml

html -> xml::document converter
2 stars 5 forks source link

Creating a common module to all html parser? #14

Closed Skarsnik closed 7 years ago

Skarsnik commented 9 years ago

I don't really know how to formulate this but I wrote a Gumbo binding https://github.com/Skarsnik/perl6-gumbo It's a robust HTML5 parsing lib that handle tag error like defined in the spec.

I realise the two modules provide the same thing: You give a html string and it give a XML::Document.

So maybe we can create a common module like Service::Parse::HTML that use h:p:x by default (since it's only native perl6 code) but if the user write use Gumbo; before. Gumbo can tell the module to use his own implementation (or another module)

I was thinking of this because I wanted a way to trick module like HTML::Restrict or HTML::Scrapper that use h:p:x to use Gumbo instead without touching their code.

tony-o commented 9 years ago

This handles tag errors just fine. It sounds like you want the supercedes directive in perl6 - if you open this on HTML::Scraper then I'll make it interchangeable as long as Gunbound or whatever conforms to the same interface

Skarsnik commented 9 years ago

Don't get me wrong. I don't want to replace h:p:x with Gumbo (since it require an external lib) I just want a way for people/module writer that need to parse html could have a common place to look at. But the implementation could be selected if an user need specific need (in my case faster parsing)

I am not sure to understand how supercedes work for what I want.

tony-o commented 9 years ago

@Skarsnik supercedes is NYI but you'd essentially do class Gumbo supercedes HTML::Parser::XML and then wherever there is use HTML::Parser::XML and they have Gumbo installed, Gumbo would be used in H:P:X's place. It's NYI. I'll write a role for HTML -> XML parsing and if you open this issue on Web::Scraper, I'll modify Web::Scraper to use that instead (this is the right way to make this work).