philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Allow option to parse attributes as maps #463

Closed chrismccord closed 1 year ago

chrismccord commented 1 year ago

With the OTP 26 map optimizations, it has become clear that I was matching on incidentally generated html attributes in specific order. This is a problem to test against, because Floki parses the attributes in the order they appear into a list, so if Phoenix or LiveView starts spitting out attributes in different order, tests fail even tho we are relying on HTML parsing to avoid matching directly on strings. Parsing attributes into a map will allow equality matching on the markup AST in tests and prevents ordering issues. What do you think? Thanks!

philss commented 1 year ago

@chrismccord So the idea is to add an option to Floki.parse_document/2 and Floki.parse_fragment!/2, right? I think this is tough because it would require the change of some functions, but I think it's possible, and it's a good addition!

PS: I think we will not have this problem once we implement https://github.com/philss/floki/issues/457, but that would be a bigger change. cc @wojtekmach

bennelsonweiss commented 1 year ago

Correct me if I'm misunderstanding your proposal, but parsing attributes into a map would lose data (the attribute ordering).

The loss of this data might manifest in ways such as the output of rendering parsed HTML being different than the originally parsed HTML. Is the benefit of equality matching AST really worth losing potentially useful data in the process?

In regards to changes to the order of rendered attributes in Phoenix or LiveView, shouldn't attributes be rendered in a consistent fashion to avoid continuously altering the produced HTML (helping cacheability, etc)?

chrismccord commented 1 year ago

Yes exactly. And yes, it will be problematic for a number of areas, such as Floki.raw_html and Floki.text needing to become map aware (which shouldn't be a massive lift). I'm sure there are other implications as well, but I think it's essential going forward to test HTML output without relying on attribute ordering.

chrismccord commented 1 year ago

Correct me if I'm misunderstanding your proposal, but parsing attributes into a map would lose data (the attribute ordering).

The loss of this data might manifest in ways such as the output of rendering parsed HTML being different than the originally parsed HTML. Is the benefit of equality matching AST really worth losing potentially useful data in the process?

In regards to changes to the order of rendered attributes in Phoenix or LiveView, shouldn't attributes be rendered in a consistent fashion to avoid continuously altering the produced HTML (helping cacheability, etc)?

Confirm, this would lose attribute ordering, but if it's an option would be opt-in. HTML wise, I can say that the order of attributes rarely (if ever?) matters. They can technically be duped as well, but are ignored by browser engines afaik.

As far as caching, I'm not convinced caching HTML fragments is really desirable, and we have dynamic attribute generation in LiveView, so we'd need to explicitly sort attributes every time, which wouldn't be ideal.

chrismccord commented 1 year ago

@bennelsonweiss if you want to do some measurements it could be helpful. The LiveView heex engine could maintain a user defined order + sorted order of dynamic attributes (say alphabetically), but I would need to know the cost. We use a lot of dynamic attributes, like in the <.form> helper and all over the place really, so sorting would be happening for quite a few tags in each template in a standard app I'd guess. Maybe it's negligible, but one of the issues even with deterministically sorted attrs is that if I want to test a given attribute exists in a node in the document as part of a match, I still need to express that with the entire attribute. #457 solves this kind of stuff in a really nice way tho, so maybe none of this matters if #457 lands.

bennelsonweiss commented 1 year ago

Opt-in seems good- I could see it causing some confusion if it was the default behavior but if somebody needs to turn it on then hopefully they'll understand the implications.

That said, if there is some alternate solution that provides the testing behavior you want without changing the format of Floki - provided by #457 or through some other helper / dsl - I could see that being even better.

philss commented 1 year ago

@chrismccord I implemented the bases for this feature in https://github.com/philss/floki/pull/467. Please let me know what you think.

After merging this, I should implement the feature in https://github.com/rusterlium/html5ever_elixir

cc @josevalim

josevalim commented 1 year ago

To clarify, the "issue" is that Phoenix generates HTML attributes in random order, which means assertions on Phoenix apps want to be order independent too. :)

philss commented 1 year ago

This feature has landed on main - you can pass the :attributes_as_maps option to parse_document/2 and parse_fragment/2. I should release a new version soon.

Thanks everyone! :)