ndmitchell / tagsoup

Haskell library for parsing and extracting information from (possibly malformed) HTML/XML documents
Other
231 stars 37 forks source link

Idea: Make TagTree a Monad! #67

Closed ChristopherKing42 closed 6 years ago

ChristopherKing42 commented 6 years ago

I know it sounds crazy, but hear me out!

We define a TagTreeM str a as

data TagTreeM str a
    = -- | A 'TagOpen'/'TagClose' pair with the 'Tag' values in between.
      TagBranch str [Attribute str] [TagTree str]
    | -- | Any leaf node
      TagLeaf (Tag str)
    | -- | A hole in the document
      Hole a
                   deriving (Eq,Ord,Show)

A we can define TagTree str as type TagTree str = TagTreeM str Data.Void.Void. [TagTreeM str] is now a Monad! (We probably want to make a newtype for [TagTreeM str], but through out this post I'll just use it directly.

What does this monad due, exactly? It is a document with holes in it to be filled in it (sort of like the Continuation Monad.) This makes it feasible to construct documents using TagSoup instead of just parsing them!

For example, we can define an operator

tag s = [TagBranch s [] [Hole ()]]

Now if we do

do
    tag "Html"
    tag "Body"
    [TagLeaf (TagText ("Hello World"))]

We get

<html>
<body>
Hello World
</body>
</html>

We can also make a list of things. For example, if we do TagBranch "ul" [] [TagBranch "li" [] [Hole 1], TagBranch "li" [] [Hole 2], TagBranch "li" [] [Hole 3]] We get a [TagTreeM str Int], which represents an html document with three holes, labeled with integers. The holes are in an unordered list. You could also imagine algebraic data types being used, representing different parts of the document.

Another useful operator would be

i :: [TagTree str] -> [TagTreeM str ()]
i t = vacuous t ++ return ()

i inserts a [TagTree str] into the document, leaving a hole beneath it for the rest of the document. It allows you to sequentially combine documents.

Now, also this is pretty radical, since tagsoup is a parsing library, not an html rendering one. It still is an interesting thing to consider, though.

Edit: It is also possible to add a Hole constructor to Tag str instead. That way both [TagTreeM str] and [Tag str] would be monads. tag s = [TagOpen s [], Hole (), TagClose s] instead then.

ndmitchell commented 6 years ago

Interesting idea. Isn't it more general than just Tag and TagTree though? Doesn't it apply to all constructors in any ADT? Is this perhaps related to open fixed points of ADTs, which would allow you to insert the extra method? https://hackage.haskell.org/package/recursion-schemes-5.0.2/docs/Data-Functor-Foldable.html#t:Fix

I think there's an interesting idea here, but not totally sure what it is. For tagsoup, the TagTree thing is very much on the periphery - an experiment that went somewhere mild but nowhere significant.

ChristopherKing42 commented 6 years ago

Actually, the general way to do it is via the Free Monad, although that would require a slightly more drastic change to the TagTree type.

As it is though, we generally can't just make [TagTree] a monad without changing TagTree's (or Tag's) definiton, since it is impossible to put arbitrary values in it.

Although this applies to ADTs, I think it could be particularly useful to TagTree, since it would essentially create an HTML templating system.

That's the idea anyways. It may be outside the scope of this package.

ChristopherKing42 commented 6 years ago

(If we want to make [Tag] a monad, we actually do not need to redefine Tag though. We can just use newtype Template str a = [Either (Tag Str) a].)

ndmitchell commented 6 years ago

I'm not convinced, and I'm not actively involved enough with this package for it to be something I'll ever investigate. If you want to go that way I suggest experimenting there - there may be value, but I'd need to see it to believe it.