Closed pdebruic closed 2 years ago
I have no idea but if it works for you. :) let us write more tests and see.
This bug is currently causing two failing tests: #testFindTag
and #testFindTagIfPresent
.
The fix makes those two pass, but the following assertion in #testQuotedAttributeValues
fail:
self
assertSoupFrom: '<a href="http://foo.com/<a> and blah and blah'
printsAs: '<a href=''"http://foo.com/''></a><a> and blah and blah</a>'.
IMHO this is "better" because an unescaped, unmatched, and unquoted double-quote in an attribute seems like an outlier and the bug described above a much more common case.
BTW where did these brain*k regexes come from? Did we inherit from Beautiful Soup? It could be nice to reimplement in something readable like PP2, but who has the time ha ha?!
In the SoupParser>>#initialize method the attrFind instvar regex does not properly handle attributes that run right up against a self closing tag. e.g.
The
ry
attribute is parsed at'"15.3"/'
and not'15.3'
The "broken" regex is
changing it to
by removing the last forward slash fixes my issue. But I don't know what else it might break. Certainly parsing trailing slashes in HTML element attributes. I assume they're allowed in the spec. But don't know.