ndmitchell / tagsoup

Haskell library for parsing and extracting information from (possibly malformed) HTML/XML documents
Other
231 stars 37 forks source link

Made `renderTagsOptions` use foldr, so that it can fuse. #45

Closed ChristopherKing42 closed 8 years ago

ChristopherKing42 commented 8 years ago

This makes renderTagsOptions use foldr, so it can fuse.

For example, if you get a tagsoup from a file, do some list operations (such as any list monad operations or comprehension operations (or flattenTree :smiley:)), it will not build the final list of tags, but rather place tagAcc' where cons would go, and (\_ _ -> []) where [] would go. (If we implement parseTags in terms of build at some point, we could make it such that no list of tags is ever generated.)

The two extra arguments are to make sure minimize and escape work correctly. This illustrates this technique. (It allows a right fold to pass arguments to the left, like magic. (I'm not sure, but I think that's how the foldl in terms of foldr trick works.)

ChristopherKing42 commented 8 years ago

I guess those two erroneous commits can give us to new sources of tests.

ChristopherKing42 commented 8 years ago

AppVeyor failed to download stack or something.

ndmitchell commented 8 years ago

Given this is a performance orientated change, have you run any performance tests? It seems with the function technique you are probably replacing analysis of : with creation and execution of function closures. I'm a little concerned that if foldr/build kicks if you'll have saved the cons, but have to pay functions instead and it will be flat - and if foldr/build doesn't kick in you'll lose. But that's only intuition, some quick numbers would overcome that.

Appveyor did seem to be an amazon/stack wobble, a rerun works.

ChristopherKing42 commented 8 years ago

Okay, I'll take a look (foldr and build are optimized away if they don't fuse, I think).

ndmitchell commented 8 years ago

foldr can't be optimised away since it's recursive, but it might get inlined using the static argument trick. build is replaced by the :. My worry is that in the non-fused case you'll end up with the cost of the cons we had before, plus the code of the function closures that were required to massage the code into foldr, not that foldr/build is inherently more expensive.

ChristopherKing42 commented 8 years ago

Yeah. I'll try some tests when I find the time.

ndmitchell commented 8 years ago

Btw, don't feel you need to go overboard with criterion based microbenchmarks over a variety of situations etc. I'm quite happy with "I tried one which did fuse and one which didn't, and they show a perf difference of N%".

ChristopherKing42 commented 8 years ago

Well, I tried it with

import Control.Monad
import Data.Char
import Text.HTML.TagSoup
import Text.HTML.TagSoup.Tree

main = replicateM_ 100 $ do
    file <- readFile "tagsoup.htm"
    print $ length $ renderTree $ transformTree f $ parseTree file
        where
            f (TagBranch name atts inner) = [TagBranch (map toUpper name) atts inner]
            f x = [x]

and it was only about 0.02 seconds faster (out of about 1.3 seconds). When renderTree wasn't inlining, it was about three times slower.

ndmitchell commented 8 years ago

That makes me think this patch isn't worth it. It's more complex, has only a tiny saving when it does fire, and a big penalty when it doesn't. Do you agree?

ChristopherKing42 commented 8 years ago

Yeah, that's what I was thinking. (Also, sorry for the late reply. We had some snow again so I had time.)

ndmitchell commented 8 years ago

OK, rejected, but certainly worth a try!

ChristopherKing42 commented 8 years ago

Yep, thanks for the guidance on code optimization.