ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Huge pages on Linux #406

Closed marcelm closed 3 months ago

marcelm commented 3 months ago

TL;DR: We can speed up strobealign on some Linuxes by allocating memory in a different way

Quoting https://wiki.debian.org/Hugepages:

When a process uses RAM, the CPU marks it as used by that process. For efficiency, the CPU allocates RAM in chunks—4K bytes is the default value on many platforms. Those chunks are named pages. Pages can be swapped to disk, etc.

Since the process address space is virtual, the CPU and the operating system need to remember which pages belong to which process, and where each page is stored. The more pages you have, the more time it takes to find where memory is mapped. When a process uses 1GB of memory, that's 262144 entries to look up (1GB / 4K). If one page table entry consume 8 bytes, that's 2MB (262144 * 8) to look up.

Most current CPU architectures support larger-than-default pages, which give the CPU/OS fewer entries to look-up. Operating system have different names for them—Huge pages on Linux, Super Pages on BSD, or Large Pages on Windows—but they are all the same thing.

I tried to see whether performance improves if huge pages are enabled for strobealign.

On my laptop, I can enable huge pages globally by doing this (as root):

# echo always > /sys/kernel/mm/transparent_hugepage/enabled

With this, any large enough chunk of memory that strobealign (or actually any process) allocates is backed by huge pages.

There seems to be indeed a small speedup on CHM13 (using 2 threads) on my AMD Ryzen:

The transparent huge pages setting (/sys/kernel/mm/transparent_hugepage/enabled) is only writable by root. Its settings can be always, never and madvise. On my laptop (Ubuntu), the default is madvise, which means that a process can explicitly ask for memory to be backed by huge pages using madvise(2). That is, we could use madvise in strobealign to use huge pages. (Without being root.)

However, here are the reasons why I’ll probably not pursue this further at this time: