When a process uses RAM, the CPU marks it as used by that process. For efficiency, the CPU allocates RAM in chunks—4K bytes is the default value on many platforms. Those chunks are named pages. Pages can be swapped to disk, etc.
Since the process address space is virtual, the CPU and the operating system need to remember which pages belong to which process, and where each page is stored. The more pages you have, the more time it takes to find where memory is mapped. When a process uses 1GB of memory, that's 262144 entries to look up (1GB / 4K). If one page table entry consume 8 bytes, that's 2MB (262144 * 8) to look up.
Most current CPU architectures support larger-than-default pages, which give the CPU/OS fewer entries to look-up. Operating system have different names for them—Huge pages on Linux, Super Pages on BSD, or Large Pages on Windows—but they are all the same thing.
I tried to see whether performance improves if huge pages are enabled for strobealign.
On my laptop, I can enable huge pages globally by doing this (as root):
With this, any large enough chunk of memory that strobealign (or actually any process) allocates is backed by huge pages.
There seems to be indeed a small speedup on CHM13 (using 2 threads) on my AMD Ryzen:
Indexing time goes from 69 to 67 s (~3%)
Mapping time (100 bp single end reads) goes from 59µs to 58µs (~1.5%)
The transparent huge pages setting (/sys/kernel/mm/transparent_hugepage/enabled) is only writable by root. Its settings can be always, never and madvise. On my laptop (Ubuntu), the default is madvise, which means that a process can explicitly ask for memory to be backed by huge pages using madvise(2). That is, we could use madvise in strobealign to use huge pages. (Without being root.)
However, here are the reasons why I’ll probably not pursue this further at this time:
The benefits are small
The code we need to add may be a bit complicated (maybe a custom memory allocator is needed)
The benefits probably depend a lot on the processer architecture
On rackham, transparent huge pages are disabled (setting is never), so even madvise(1) won’t help
On the KTH cluster dardel, the setting is always already, so we wouldn’t need to do anything anyway.
TL;DR: We can speed up strobealign on some Linuxes by allocating memory in a different way
Quoting https://wiki.debian.org/Hugepages:
I tried to see whether performance improves if huge pages are enabled for strobealign.
On my laptop, I can enable huge pages globally by doing this (as root):
With this, any large enough chunk of memory that strobealign (or actually any process) allocates is backed by huge pages.
There seems to be indeed a small speedup on CHM13 (using 2 threads) on my AMD Ryzen:
The transparent huge pages setting (
/sys/kernel/mm/transparent_hugepage/enabled
) is only writable by root. Its settings can bealways
,never
andmadvise
. On my laptop (Ubuntu), the default ismadvise
, which means that a process can explicitly ask for memory to be backed by huge pages using madvise(2). That is, we could use madvise in strobealign to use huge pages. (Without being root.)However, here are the reasons why I’ll probably not pursue this further at this time:
never
), so evenmadvise(1)
won’t helpalways
already, so we wouldn’t need to do anything anyway.