Open damies13 opened 2 weeks ago
Sounds good to me, especially if
Based on your example, the first point above is true at least with lxml. Do you know does the standard ElementTree support HTML? Parse HTML
working only when lxml is available would be ok, I think there's at least one such keyword already.
I'm slightly worried would this open a door for future requests related to HTML like being able to use CSS selectors. I believe that would require a new dependency, and at that point it would be better to have an external library either only for HTML or for both HTML and XML
I'm not sure about the css selectors, as I only used xpath for my test, I see your concern though.
Perhaps it would be enough to put a note in the documentation for the Parse HTML
on what html features will and won't be supported?
I'll do some research to find out if lxml supports css selectors and come back on that.
My point was that I don't consider CSS selector support that important in an XML library, and my worry was that people could want to turn it into a HTML library. That said, being able to parse HTML as XML would itself be convenient.
I also noticed that lxml has a limited support for CSS selectors. I'm fine it being exposed especially if it works without any new dependency. The main benefit I see is working with classes as something like span.example
is very annoying to write properly as an xpath expression. Anyway, that would require a separate issue.
I would like to suggest adding a
Parse HTML
keyword to XML Library.Why:
Parse XML
keyword I get errors because html elements are not valid xml<meta>
and<img>
elements do not have closing elements and fail xml validationAlternatives: None really, I considered creating a HTML library that would basically be a copy of XML Library but it seems like a big duplication of effort, as it would be using the same
lxml.etree
module anyway.Workaround: Currently my workaround is to load the html as an element tree using the html parser, then pass the etree object to the XML library keywords, example below:
Would you like me to work on a PR for this?