kovidgoyal / html5-parser

Fast C based HTML 5 parsing for python
Apache License 2.0
678 stars 33 forks source link

Support for streaming/accessing only the `<head>` and skipping `<body>` parsing #31

Closed jdb8 closed 8 months ago

jdb8 commented 8 months ago

Thank you for this library! I'm curious to know if it's possible, for a large DOM tree, to more performantly access only the <head> of the page rather than unnecessarily parsing anything else.

I have code that is parsing a large page in full and then subsequently using xpath to access only the .//head/*, and I'm wondering if I can skip the unnecessary work of parsing the <body> using html5-parser or whether I'd need to look at a library that supports stream-parsing instead (which I don't believe this one does?).

kovidgoyal commented 8 months ago

No, html5-parser does not support streaming.