His test involved using the mushcode of the game Eldritch and performing various operations; looking at some data dictionary object, viewing a character sheet, and various other things that were very fast on vanilla but now lag noticeably.
My first suggestion was to recompile with TCmalloc, as one of my big-and-dangerous changes was to remove the fancy allocator that MUX normally uses. Using TCmalloc helped enormously, and it's what I've been using in development. Things were still slower than vanilla, though.
Using gperftools cpuprofiler features, I prepared several call weight PDFs;
This shows a great deal of the bottleneck may be in PCRE-- but that could well be from allocation-heavy code. A massive amount of time is spent zeroing memory; I likely have been too eager to use calloc() where malloc() will do, and the larger LBUF size compounds the problem.
Most of this seems addressed by allocz changes from cfe20eee; Issues beyond that could well be related to performance issues in the default libc allocator; consider using TCmalloc from gperftools.
His test involved using the mushcode of the game Eldritch and performing various operations; looking at some data dictionary object, viewing a character sheet, and various other things that were very fast on vanilla but now lag noticeably.
My first suggestion was to recompile with TCmalloc, as one of my big-and-dangerous changes was to remove the fancy allocator that MUX normally uses. Using TCmalloc helped enormously, and it's what I've been using in development. Things were still slower than vanilla, though.
Using gperftools cpuprofiler features, I prepared several call weight PDFs;
__select
and__nss_hosts_lookup
This shows a great deal of the bottleneck may be in PCRE-- but that could well be from allocation-heavy code. A massive amount of time is spent zeroing memory; I likely have been too eager to use
calloc()
wheremalloc()
will do, and the larger LBUF size compounds the problem.