ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

Is it possible to keep the FCS-GX database always loaded? #78

Closed eeaunin closed 2 months ago

eeaunin commented 2 months ago

Hello. Is it possible to create a dedicated compute node on an LSF cluster where FCS-GX database would persistently stay loaded? In every FCS-GX run that I'm currently doing, the FCS-GX database gets copied to /tmp and then deleted from /tmp afterwards. When FCS-GX is run many times a day every day, the copying and deletion of the database gets repeated a lot. Much less /tmp disk activity would be required if it was possible to have an instance of FCS-GX that always stays in memory, has the database already loaded and can process new assemblies on demand. Is there a way to set it up like that?

etvedte commented 2 months ago

Hello,

I recognize your username from #69. Hopefully your jobs are running consistently now.

For batch processing, there are a few methods to try:

  1. In your bsub command, use -R to request a specific host with adequate memory. Then if you execute successive jobs requesting the same host with little downtime, you may be able to access the database pages cached in RAM. If too much time elapses the OS will purge the files.
  2. You could make a bash script to process a batch of genomes files and then execute the script with bsub
  3. You could mount the files using the instructions for "4. Caching the database" under the FCS-GX quickstart . I'm assuming you aren't doing this already because you don't have sudo privileges.

Eric

eeaunin commented 2 months ago

Hello, thanks for the reply! About issue #69: based on internal logs of Linux on compute nodes, FCS-GX seems to use much more memory compared to what the LSF reports say. So it looks like the LSF reports are inaccurate for FCS-GX, and requesting 470 Gb of memory from the compute nodes every time when running FCS-GX will probably prevent crashes. But I haven't tested it enough times yet to be fully sure that this solves the problem