Open AntouanK opened 4 years ago
In HTML 5, certain tags can omit their end tags and still be valid markup. In the HTML 5 spec's section on optional tags, the following elements are listed as having optional end tags:
html
, head
, body
, li
, dt
, dd
, p
, rt
, rp
, optgroup
, option
, colgroup
, caption
, thead
, tbody
, tr
, td
, th
, tfoot
The spec lists the rules for when each tag would be implicitly closed, so that can be a nice starting point for researching how feasible this is with the current parser's design.
I ended up using cheerio
in my back-end, to "sanitize" all the HTML that comes from the HN api.
As a quick solution because it was blocking my UI entirely.
But still, since this package is basically the only way to render HTML string in Elm, I think we should consider this ticket.
@hecrj let us know what you think.
The parser must definitely support the whole spec. Therefore, this is missing functionality.
However, I don't think I will be able to invest time on this soon. If anyone else is willing to give it a shot, by all means go for it! I will gladly code review any efforts.
I think a good starting point, before starting with the implementation, would be to set up an exhaustive test suite. We seem to be dealing with a bunch of different cases, each one with its own rules.
html
,head
,body
,li
,dt
,dd
,p
,rt
,rp
,optgroup
,option
,colgroup
,caption
,thead
,tbody
,tr
,td
,th
,tfoot
In addition, I was having problems with unclosed <a>
tags.
Having an issue with using the HN api.
The HTML they send is not closing some tags (
<p>
for example ). I can assume this is not "valid HTML" but all browsers support it. Can we have some option to accept it as well in the parser?Example :
and value is under
text
: