Closed purarue closed 1 year ago
tried using lxml for this, havent been able to figure it out yet
If anyone else has libraries they'd recommend here, I'm very open to suggestions, all my experiments haven't gone well
ended up just using an html tokenizer in go
this is all legacy anyways, so I dont know if anyone else is ever even going to use this, is more for my own usage
loading the whole html document into memory is pretty expensive memory wise, could either use a streaming html parser, or maybe split the file before loading it?