Open wojtekmach opened 1 year ago
@philss I remember we talked a little bit about it but I don't remember much. :) I think the main concern was we obviously cannot return this from Floki.parse*
functions as it would be a major breaking change. I think we solve this with a separate module.
If we go with the struct, I'm curious whether Floki.attr
and Floki.attribute
functions would work on it or we should have equivalents on the struct module.
Btw, is the distinction between document and fragment such that the former always contains exactly one root element? If so the struct could have attributes field which would make accessing these super convenient. But then again I'd guess working with fragments is more common. So maybe we have two different structs after all?
Hey maybe I do remember parts of our earlier conversations. :)
I'd like to add a Floki.Doc struct and a Floki.Doc.parse!/1 function.
I think the main concern was we obviously cannot return this from Floki.parse* functions as it would be a major breaking change.
@wojtekmach yeah, I think it's aligned with what we discussed. We wanted to avoid this breaking change, but I think in the future this "Doc.parse" could be the main API. I'm not sure if we discussed what would be the struct, but I imagine it would be the tree representation, like we have in Floki.HTMLTree
. Is this what you are thinking?
If we go with the struct, I'm curious whether Floki.attr and Floki.attribute functions would work on it or we should have equivalents on the struct module.
We would probably want to add support for the new struct on these functions.
Btw, is the distinction between document and fragment such that the former always contains exactly one root element?
Structurally speaking, yes. But semantically the document is something that has the root element being "", but the specs say that we need a <!doctype html>
as well (we are just ignoring this part today). Fragments don't have this restriction, but I'm not sure if we should have another struct for them.
Something that can help us if we go for two structs is the specs (they are too complex, so we shouldn't worry that much):
Hey maybe I do remember parts of our earlier conversations. :)
:D
Sorry, I wasn’t aware of HTMLTree struct. I didn’t really look into internals at all. 😅
In case this gets implemented, I would suggest the name to be Floki.Document instead of Floki.Doc, since I read this issue and thought it was something documentation-related
If, per https://github.com/philss/floki/issues/463, we have maps as attributes and we add an ~HTML sigil (as a macro) we'd get these map match semantics for free:
html = ~HTML"""
<p class="p1">foo</p>
<p class="p2">bar</p>
"""
# these two are equivalent
assert ~HTML[<p class="p2">bar</p>] = html[".p2"]
assert ~HTML[<p>bar</p>] = html[".p2"]
assert html[".p2"] == ~HTML[<p class="p2">bar</p>]
which is potentially very interesting for testing.
@wojtekmach This is pretty similar to how Meeseeks already works. https://github.com/mischov/meeseeks/blob/8ac9b48b6f8b1daae18f9b0773882cf83c094777/lib/meeseeks/document.ex#L26-L50
Similar how?
FWIW EasyHTML mentioned at the beginning uses the "floki ast", the one returned from Floki.parse* functions. The querying-optimised one in Meeseeks is very interesting. I guess the point is if we use a struct we can consider the ast as implementation detail and pick either!
Similar in that it already implements the output of both parsing and selection in terms of structs (and provides a nice toolkit for working with those structs), meaning the building blocks are in place for something like EasyHTML.
Ah, makes sense!
It also goes beyond a single Node struct and has a top level Document
struct, as well as Comment
, Data
, Doctype
, Element
, ProcessingInstruction
, and Text
structs, which is something else to consider.
Hi!
I maintain a tiny Floki wrapper called EasyHTML which adds a struct around nodes and thus we can implement protocols and behaviours. Here's an example:
I'd like to add a
Floki.Doc
struct and aFloki.Doc.parse!/1
function.Feedback appreciated!