Closed umuro closed 4 years ago
attribute contents crash Floki when they contain & and later ;. For example "&adxnnl=1;" .The tag example below is from https//:nytimes.com
Steps to reproduce the behavior:
"<meta data-rh=\"true\" name=\"msapplication-task\" content=\"name=Homepage;action-uri=https://www.nytimes.com?src=iepin&adxnnl=1;icon-uri=https://static01.nyt.com/images/icons/homepage.ico\"/>" |> Floki.parse
Floki should not mistake it for a special character whenever there is an & around... In some sites, query strings are used as attribute contents. Query strings mean a lot of &'s around.
Before Floki.parse get rid of the annoying pattern
~r/&(?=[[:alnum:]]+=.+;)/ |> Regex.replace(string, "\+\*\+\*")
It's easy to revert this also. But a lot of precious CPU time is lost
@umuro Thank you for opening the issue! It was fixed in version 0.23.1. Can you try again with that version?
Description
attribute contents crash Floki when they contain & and later ;. For example "&adxnnl=1;" .The tag example below is from https//:nytimes.com
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Floki should not mistake it for a special character whenever there is an & around... In some sites, query strings are used as attribute contents. Query strings mean a lot of &'s around.
Workaround
Before Floki.parse get rid of the annoying pattern
It's easy to revert this also. But a lot of precious CPU time is lost