ncbi / fcs

Foreign Contamination Screening caller scripts and documentation

Other

88 stars 12 forks source link

Page-fault rate for accessing /app/db/gxdb/all.gxs is 106% (should be 0). This means that the in-memory GX database is either not on RAM-backed filesystem, or swapped-out. GX requires the database to be entirely in RAM; otherwise it will run extremely slow. Consider placing the database files in a non-swappable ramfs. Or `vmtouch -l -v -m 1000G /path/to/gxdb/all.gx{i,s}` to lock the database pages in RAM. Will prefetch (vmtouch) the database pages to have the OS cache them in main memory. export GX_PREFETCH=0 to turn off prefetching; =1 - auto(default); =2 - always-on.

Hi Ole -- could you elaborate on how you were running 0.2.2, and trying for 0.3.0?

Logistically, you have two choices for how to get the db into memory:

copy it into a ramdisk space, either /dev/shm or tmpfs
skip copying, and leave it to GX to prefetch the db into memory

2 involves setting `SHM_LOC=<disk path>` and the `--gx-db "${SHM_LOC}/gxdb/all"` parameter. You'll see that Page-fault message from GX, at which time it will automatically do the `vmtouch` command to cache the database into RAM before screening. The copy or prefetch speed is dependent on your file system. For us, it's either 8 or 35 minutes, depending on where we're reading from (new or older tech).

Are you saying before you were getting fast runs (3 minutes for 180 Mbp, which is about 1/4th of what we get with 48 cores and the db already in /dev/shm) WITHOUT an explicit copy into ramdisk step or waiting for prefetching? We have one other user where we think their HPC file system may have a sizable cache that is able to provide random access to the db that is much faster than SSD but not quite as fast as having the db already in local memory. But it requires the db to have been read recently (i.e. in the file system cache). We're not positive that's the explanation for their results, but it seemed plausible. Could that be the case for your setup? Try copying all the db files anywhere (cp all* /dev/null/ might do it?) and see if that speeds up your subsequent run.

We're revamping the commands a bit more in a new version coming soon, which will hopefully make the logistics of working with the db easier to understand.

ncbi / fcs

Performance differences between 0.2.2 and 0.3.0 #29