mochi / mochiweb

MochiWeb is an Erlang library for building lightweight HTTP servers.
Other
1.86k stars 474 forks source link

Optionally remove trailing slashes for some tags? #255

Closed sardaukar closed 1 year ago

sardaukar commented 1 year ago

When I parse HTML with Mochiweb and convert it to HTML like (Elixir code)

iex(8)> html = "<root_node><img src='https://goo.gl'></root_node>"
"<root_node><img src='https://goo.gl'></root_node>"

iex(9)> {"root_node", [], parsed} = :mochiweb_html.parse(html)
{"root_node", [], [{"img", [{"src", "https://goo.gl"}], []}]}

iex(10)> {"root_node", [], List.wrap(parsed)} |> :mochiweb_html.to_html
[
  ["<", "root_node", [], ">"],
  ["<", "img", [[" ", "src", "=\"", "https://goo.gl", "\""]], " />"],
  ["</", "root_node", ">"]
]

The parsed output for <img src='https://goo.gl'> is <img src='https://goo.gl' /> (with an added trailing slash). As per the W3C validator, trailing slashes are not needed on void elements:

image

Is there a way to remove them?

etrepum commented 1 year ago

The reason it's not done this way is simply because the early versions of this code predate the html5 standard and normalizes to xhtml instead. It was also used at the time to parse/generate xml in addition to html and it doesn't really distinguish between the two except internally for parsing. It also always quotes attributes and uses a space before the closing slash so the linter warning is not really relevant to what mochiweb_html will produce.

That said, you could make your own copy of this module and change it accordingly. This code starting on line 187 of mochiweb_html:

to_html([{start_tag, Tag, Attrs, Singleton} | Rest], Acc) ->
    Open = [<<"<">>,
            Tag,
            attrs_to_html(Attrs, []),
            case Singleton of
                true -> <<" />">>;
                false -> <<">">>
            end],
    to_html(Rest, [Open | Acc]);

If modified in this way, it should behave in the manner that you want (untested):

to_html([{start_tag, Tag, Attrs, Singleton} | Rest], Acc) ->
    Open = [<<"<">>,
            Tag,
            attrs_to_html(Attrs, []),
            case Singleton and not is_singleton(Tag) of
                true -> <<" />">>;
                false -> <<">">>
            end],
    to_html(Rest, [Open | Acc]);
sardaukar commented 1 year ago

Thanks for the reply and potential solution!