intern tags - Githubissues

Interning is not a win. There is a substantial amount of additional processing and allocation just for managing the pointers that let us hash the memory without copying it.

The hs-notmuch-tag-count program was used for testing, on a database with > 160,000 mails and > 180,000 tags:

[T470s:~/dev/hs-notmuch] [ tag-fu● ] ftweedal% cabal run hs-notmuch-tag-count -- ~/mail.bak '*' +RTS -p                                                                      
Preprocessing library notmuch-0.1.0.0...                                                                                                                                     
Preprocessing executable 'hs-notmuch-tag-count' for notmuch-0.1.0.0...                                                                                                       
Running hs-notmuch-tag-count...                                                                                                                                              
fromList [("unread",58),("replied",2537),("attachment",11203),("encrypted",1),("flagged",102),("signed",8004),("inbox",161474)]

Prior to interning (using a custom Tag representation) the profile summary is:

        total time  =        8.34 secs   (8335 ticks @ 1000 us, 1 processor)
        total alloc = 244,189,264 bytes  (excludes profiling overheads)

COST CENTRE       MODULE            SRC                                       %time %alloc

withMessageHandle Notmuch.Binding   src/Notmuch/Binding.chs:74:1-68            50.9    5.8
tagsToList        Notmuch.Binding   src/Notmuch/Binding.chs:(571,1)-(576,16)   26.4    3.9
withMessages      Notmuch.Binding   src/Notmuch/Binding.chs:73:1-58            16.1    2.6
detachPtr         Notmuch.Talloc    src/Notmuch/Talloc.chs:(29,1)-(30,61)       4.2    0.0
go                Main              tools/TagCount.hs:(43,1)-(49,54)            0.5   19.6
accumTag          Main              tools/TagCount.hs:52:1-69                   0.5   35.8
tagFromCString    Notmuch.Tag       src/Notmuch/Tag.hs:(81,1)-(83,35)           0.2    6.6
messagesToList    Notmuch.Binding   src/Notmuch/Binding.chs:(587,1)-(592,80)    0.2   21.7
message_get_tags  Notmuch.Binding   src/Notmuch/Binding.chs:(438,1)-(439,46)    0.0    2.6
hash              Data.HashMap.Base Data/HashMap/Base.hs:143:1-28               0.0    1.2

After interning it is

        total time  =        8.37 secs   (8372 ticks @ 1000 us, 1 processor)
        total alloc = 304,709,328 bytes  (excludes profiling overheads)

COST CENTRE          MODULE                 SRC                                       %time %alloc

withMessageHandle    Notmuch.Binding        src/Notmuch/Binding.chs:74:1-68            51.3    4.7
tagsToList           Notmuch.Binding        src/Notmuch/Binding.chs:(564,1)-(569,31)   25.4    4.6
withMessages         Notmuch.Binding        src/Notmuch/Binding.chs:73:1-58            16.6    2.1
detachPtr            Notmuch.Talloc         src/Notmuch/Talloc.chs:(29,1)-(30,61)       4.0    0.0
intern               Data.Interned.Internal Data/Interned/Internal.hs:(78,1)-(87,33)    0.5   16.9
go                   Main                   tools/TagCount.hs:(43,1)-(49,54)            0.4   15.7
accumTag             Main                   tools/TagCount.hs:52:1-69                   0.4   28.7
internTagFromCString Notmuch.Tag            src/Notmuch/Tag.hs:(145,1)-(147,45)         0.2    4.8
messagesToList       Notmuch.Binding        src/Notmuch/Binding.chs:(580,1)-(585,80)    0.2   17.4
message_get_tags     Notmuch.Binding        src/Notmuch/Binding.chs:(431,1)-(432,46)    0.0    2.1
hashPtrWithSalt      Data.Hashable.Class    Data/Hashable/Class.hs:(714,1)-(716,23)     0.0    1.9

tagFromCString is where the copying to pinned memory occurs. Its cost centre report goes from:

tagFromCString Notmuch.Tag src/Notmuch/Tag.hs:(81,1)-(83,35)   236     183379    0.2    6.6     0.2    6.6

tagFromCString Notmuch.Tag src/Notmuch/Tag.hs:(140,1)-(142,35) 249          7    0.0    0.0     0.0    0.0

Which is exactly what we expect: 183379 allocations down to 7 - one for each tag. But the overhead is much higher and is not a performance win, but it is an abject complexity loss.

Was a fun experiment. Committed to https://github.com/purebred-mua/hs-notmuch/tree/feature/intern-tags / https://github.com/purebred-mua/hs-notmuch/commit/2fda75caa1d904ed1333b985963edf5cdebfa4c2 for this historical record.

purebred-mua / hs-notmuch

intern tags #14