knrafto / language-bash

Parse and pretty-print Bash shell scripts
BSD 3-Clause "New" or "Revised" License
35 stars 9 forks source link

Performance #40

Open hasufell opened 4 years ago

hasufell commented 4 years ago

I use language-bash in ghcup to parse variable assignments from files like /etc/os-release: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/lib/GHCup/Utils/Bash.hs

This works relatively well, but when I profile ghcup on the simplest command ghcup list, I get this:

    Thu Jun 25 21:56 2020 Time and Allocation Profiling Report  (Final)

       ghcup +RTS -p -RTS list

    total time  =        0.05 secs   (52 ticks @ 1000 us, 1 processor)
    total alloc =  56,666,312 bytes  (excludes profiling overheads)

COST CENTRE       MODULE                            SRC                                                      %time %alloc

word              Language.Bash.Parse.Word          src/Language/Bash/Parse/Word.hs:300:1-43                  15.4   40.2
jstring           Data.Aeson.Parser.Internal        Data/Aeson/Parser/Internal.hs:316:1-32                    11.5    5.5
try               Control.Monad.Catch               src/Control/Monad/Catch.hs:789:1-47                        9.6    9.4
skipSpace         Language.Bash.Parse.Word          src/Language/Bash/Parse/Word.hs:(81,1)-(86,50)             5.8    1.5
<?>               Data.Aeson.Types.Internal         Data/Aeson/Types/Internal.hs:532:1-74                      3.8    0.4
unfoldDirContents Streamly.External.Posix.DirStream src/Streamly/External/Posix/DirStream.hs:(50,1)-(59,69)    3.8    1.1
dirContentsStream Streamly.External.Posix.DirStream src/Streamly/External/Posix/DirStream.hs:(70,1)-(72,83)    3.8    2.0
lexeme            Text.Megaparsec.Lexer             Text/Megaparsec/Lexer.hs:78:1-23                           3.8    3.2
pathParser'       URI.ByteString.Internal           src/URI/ByteString/Internal.hs:(567,1)-(569,82)            3.8    1.6
decimal           Data.Attoparsec.ByteString.Char8  Data/Attoparsec/ByteString/Char8.hs:(447,1)-(448,49)       1.9    0.1
copy              Data.HashMap.Array                Data/HashMap/Array.hs:(328,1)-(333,30)                     1.9    0.0
object_'          Data.Aeson.Parser.Internal        Data/Aeson/Parser/Internal.hs:(135,49)-(137,22)            1.9    0.5
catchAll          Control.Monad.Catch               src/Control/Monad/Catch.hs:744:1-16                        1.9    1.5
mapExcepts        Haskus.Utils.Variant.Excepts      src/lib/Haskus/Utils/Variant/Excepts.hs:102:1-33           1.9    0.0
liftE             Haskus.Utils.Variant.Excepts      src/lib/Haskus/Utils/Variant/Excepts.hs:110:1-38           1.9    0.0
mapVariantAt      Haskus.Utils.Variant              src/lib/Haskus/Utils/Variant.hs:(413,1)-(416,47)           1.9    0.0
uncons            Data.ByteString.Lazy.UTF8         Data/ByteString/Lazy/UTF8.hs:(228,1)-(229,38)              1.9    0.6
parseAbs          HPath                             src/HPath.hs:(142,1)-(147,38)                              1.9    0.0
word              Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:175:1-66               1.9    0.1
ioDesc            Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:188:1-46               1.9    0.3
assign            Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:217:1-43               1.9    0.2
anyOperator       Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:209:1-51               1.9    3.0
select            Language.Bash.Operator            src/Language/Bash/Operator.hs:19:1-43                      1.9    1.8
runParsecT        Text.Megaparsec.Internal          Text/Megaparsec/Internal.hs:(591,1)-(596,56)               1.9    0.1
getOptic          Optics.Internal.Optic             src/Optics/Internal/Optic.hs:70:5-12                       1.9    0.0
wrapCompile       Text.Regex.Posix.Wrap             src/Text/Regex/Posix/Wrap.hsc:(372,1)-(384,42)             1.9    0.0
regNameParser     URI.ByteString.Internal           src/URI/ByteString/Internal.hs:(533,1)-(535,59)            1.9    0.3
getDownloadsF     GHCup.Download                    lib/GHCup/Download.hs:(105,1)-(127,42)                     1.9    0.0
printListResult   Main                              app/ghcup/Main.hs:(1332,1)-(1377,18)                       1.9    0.2
satisfying        Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:(130,1)-(132,49)       0.0    1.3
pack              Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:(99,1)-(122,14)        0.0    2.2
anyWord           Language.Bash.Parse.Internal      src/Language/Bash/Parse/Internal.hs:171:1-39               0.0    2.4
name              Language.Bash.Parse.Word          src/Language/Bash/Parse/Word.hs:(313,1)-(316,38)           0.0    1.8
functionName      Language.Bash.Parse.Word          src/Language/Bash/Parse/Word.hs:(320,1)-(324,31)           0.0    1.5
getDownloads      GHCup.Download                    lib/GHCup/Download.hs:(143,1)-(261,42)                     0.0    2.2

This is on a 7 LOC bash file:

NAME="Exherbo"
PRETTY_NAME="Exherbo Linux"
ID="exherbo"
ANSI_COLOR="0;32"
HOME_URL="https://www.exherbo.org/"
SUPPORT_URL="irc://irc.freenode.net/#exherbo"
BUG_REPORT_URL="https://bugs.exherbo.org/"

I'm wondering if there's a way to reduce the allocations.

knrafto commented 4 years ago

Hmm, unfortunately I don't know too much about haskell performance, so I can't really say off the top of my head what's causing all the allocations. The parser certainly wasn't written with performance in mind.

If you know the files you're parsing are going to be in this nice list-of-variables format, maybe it would be better to use a custom parser?

knrafto commented 4 years ago

I also notice you're parsing each file twice: https://github.com/haskell/ghcup-hs/blob/fafff9dadd60e7f85d1fdea00b324bae7b40164c/lib/GHCup/Platform.hs#L153

hasufell commented 4 years ago

If you know the files you're parsing are going to be in this nice list-of-variables format, maybe it would be better to use a custom parser?

I started that already, but I don't think it's a good idea. There are so many assumptions when doing a naive parser and I won't have an easy way of knowing whether a valid user file was incorrectly parsed, because these are silent failures.

On the other hand it seems this is only a shell subset: https://www.freedesktop.org/software/systemd/man/os-release.html

So maybe feasible after all.

I also notice you're parsing each file twice: https://github.com/haskell/ghcup-hs/blob/fafff9dadd60e7f85d1fdea00b324bae7b40164c/lib/GHCup/Platform.hs#L153

good catch