philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Loosing information about single/double quotes in the attribute value #181

Closed grych closed 6 years ago

grych commented 6 years ago

When parsing, Floki loses the information if the attribute was introduced as single or double quoted text. This is generating an issue when rendering attributes with double quotes inside.

Observe the example:

iex> html = "<tag attr='one \"two\" three'>"
"<tag attr='one \"two\" three'>"
iex> Floki.attribute(html, "attr")
["one \"two\" three"]

Attribute is retrieved correctly, as a proper Elixir string. But let's try to re-render the parsed HTML:

iex> rerendered = html |> Floki.parse() |> Floki.raw_html()
"<tag attr=\"one \"two\" three\"></tag>"

This is a different html than it was in the beginning. And the attribute value has changed:

iex> Floki.attribute(rerendered, "attr")
["one "]

This behaviour causes this issue: grych/drab#117. Notice that using quotes inside the attribute value is correct.

I understand that Floki does not store the information about single/double quotes used in the attribute, but maybe the simple workaround exists, like if there is a double quote inside the attribute value, render it with single quote?

mischov commented 6 years ago

A change here would be a quick fix.

defp build_attrs({attr, value}, attrs), do: ~s(#{attrs} #{attr}="#{value}")

could become

defp build_attrs({attr, value}, attrs), do: ~s(#{attrs} #{attr}="#{String.replace(value, "\"", "'")}")
grych commented 6 years ago

Nope, it will change the attribute value, if you have one "two" three it will become one 'two' three, which is a different string.

I was thinking rather about something like:

if String.contains?(value, "\"") do
  ~s(#{attrs} #{attr}='#{value}')
else
  ~s(#{attrs} #{attr}="#{value}")
end
mischov commented 6 years ago

Good point. I can't think of a better solution.

grych commented 6 years ago

It should do the trick, as no other escaping single or double quotes is permitted in the attribute value, as well as the value containing both single and double quote in the same time is not allowed.

philss commented 6 years ago

@grych good catch! I think your solution should work good. Please go ahead with a PR. Thank you!

grych commented 6 years ago

@philss will do!

grych commented 6 years ago

Fixed in #182. Please let me know when it comes into the release, as I need to update deps for Drab. Thanks!