haskell / network-uri

URI manipulation facilities
Other
24 stars 33 forks source link

Optimise isReserved #46

Closed mpickering closed 4 years ago

mpickering commented 4 years ago

This function is called many thousands of times by ghcide and was a hotspot in profiling.

This patch reduced time taken on one benchmark which called hover 1000 times from 259 seconds to 170 seconds.

See https://github.com/digital-asset/ghcide/issues/101

fendor commented 4 years ago

Maybe it makes sense to add this information as a comment? So that feature devs understand why this optimisation has been done?

AndreasPK commented 4 years ago

mpickering showed this patch off to me so I look at the code a bit.

isReserved is really the meat of what this patches optimizes.

Before this patch it compiled to this core:

  = \ (c1_abyn :: Char) ->
      case GHC.List.elem
             @ Char
             ghc-prim-0.5.3:GHC.Classes.$fEqChar
             c1_abyn
             Network.URI.isAllowedInURI5
      of {
        False ->
          GHC.List.elem
            @ Char
            ghc-prim-0.5.3:GHC.Classes.$fEqChar
            c1_abyn
            Network.URI.isAllowedInURI3;
        True -> ghc-prim-0.5.3:GHC.Types.True
      }

essentially elem c foo || elem c bar

With this patch isReserved compiles to this:

Network.URI.$wisReserved
  = \ (ww_sq86 :: ghc-prim-0.5.3:GHC.Prim.Char#) ->
      case ww_sq86 of {
        __DEFAULT -> ghc-prim-0.5.3:GHC.Types.False;
        '!'# -> ghc-prim-0.5.3:GHC.Types.True;
        '#'# -> ghc-prim-0.5.3:GHC.Types.True;
        '$'# -> ghc-prim-0.5.3:GHC.Types.True;
        '&'# -> ghc-prim-0.5.3:GHC.Types.True;
        '\''# -> ghc-prim-0.5.3:GHC.Types.True;
        '('# -> ghc-prim-0.5.3:GHC.Types.True;
        ')'# -> ghc-prim-0.5.3:GHC.Types.True;
        '*'# -> ghc-prim-0.5.3:GHC.Types.True;
        '+'# -> ghc-prim-0.5.3:GHC.Types.True;
        ','# -> ghc-prim-0.5.3:GHC.Types.True;
        '/'# -> ghc-prim-0.5.3:GHC.Types.True;
        ':'# -> ghc-prim-0.5.3:GHC.Types.True;
        ';'# -> ghc-prim-0.5.3:GHC.Types.True;
        '='# -> ghc-prim-0.5.3:GHC.Types.True;
        '?'# -> ghc-prim-0.5.3:GHC.Types.True;
        '@'# -> ghc-prim-0.5.3:GHC.Types.True;
        '['# -> ghc-prim-0.5.3:GHC.Types.True;
        ']'# -> ghc-prim-0.5.3:GHC.Types.True
      }

The former has overhead of two function calls, and the actual checking loops.

The later has no overhead + results in a binary search for the pattern match.

As a result the later is about 10 times as fast in a simple benchmark like this:

      env (return "http://ex.it/foo/bar/baz/bap") $ \u1 ->
        bench "u1" $ nf (map (isReserved)) u1,

Currently processing one character like this takes somewhere between 100 and 150ns.

After this patch it takes 10ns and 15ns.
That's only about 60 cycles and includes traversing one element of the list so not bad.

osa1 commented 4 years ago

Amazing.

Maybe it makes sense to add this information as a comment?

Agreed, someone in a few months will look at this code and say "why not use elem?" and revert this change.

osa1 commented 4 years ago

This may be a bug in elem/build rule in GHC, see https://gitlab.haskell.org/ghc/ghc/issues/17752#note_250120

ezrakilty commented 4 years ago

It would be great if the compiler did the right thing for the elem version. In the meantime, though, I don't mind this being written out as a case match. I'll add a comment explaining why we've done so and linking back to here.