Open kiwijam opened 4 years ago
I think I can't control it, since Modest
performs some preprocessing but I can be wrong.
@kiwijam @rushter
In Modest we have buffer positions for attributes in tokens You can use this for get raw data.
Added limited support for this in 0.2.7.
>>> html_parser = HTMLParser('<div><test></div>')
>>> selector = html_parser.css_first('div')
>>> selector.child.html
'<test>'
>>> selector.child.raw_value
b'<test>'
This is limited to text nodes only for now.
Added limited support for this in 0.2.7.
>>> html_parser = HTMLParser('<div><test></div>') >>> selector = html_parser.css_first('div') >>> selector.child.html '<test>' >>> selector.child.raw_value b'<test>'
This is limited to text nodes only for now.
Thanks for your work done. How can I join in the maintenance of the library. I would like to be of help so that more features can be added.
Added limited support for this in 0.2.7.
>>> html_parser = HTMLParser('<div><test></div>') >>> selector = html_parser.css_first('div') >>> selector.child.html '<test>' >>> selector.child.raw_value b'<test>'
This is limited to text nodes only for now.
Thanks for your work done. How can I join in the maintenance of the library. I would like to be of help so that more features can be added.
Well, It's open-source. You are welcome to propose new features or improve existing ones.
You can improve the new raw_value
feature to support arbitrary nodes.
That's a pretty easy task, but you will need to be familiar with the C language and Modest library though.
As far as I can tell, there's no easy way to extract text but preserve HTML entity encoding at the moment.
Having that option would be handy!