FCS-GX runtime on SGE cluster

ncbi / fcs

Foreign Contamination Screening caller scripts and documentation

Other

88 stars 12 forks source link

FCS-GX runtime on SGE cluster #68

Closed YanisChrys closed 2 months ago

YanisChrys commented 4 months ago

Hello,

I am running FCS-GX clean genome on a ~1.8Gb genome on an SGE cluster with ~45 cores which, based on our configurations, should give FCS access to around 1TB of RAM. I read in a previous post (https://github.com/ncbi/fcs/issues/6) that on a genome half the size it should take 15 minutes with enough RAM. However, for me, it still takes ~ 2-3 days to run. Do you have any insights on why that is? Is this normal behaviour or is FCS somehow running out of RAM or diskspace? The full ("all") database has already been downloaded locally and can be found and accessed by the program.

Note: I am using FCS-GX with singularity

Any help would be greatly appreciated! :)

etvedte commented 4 months ago

Hello,

Can you clarify a few things?

I'm assuming you're experience performance issues on the screen command ./fcs.py screen genome and not the clean command ./fcs.py clean genome. Is this correct?
Can you verify the host has enough RAM? Please post the output of the command free -h
Can you run in debug mode, i.e. python3 ./fcs.py screen genome --debug ... Post your entire command in the response and attach a file with the log output (making sure any sensitive information is sanitized).
We do use UGE for running FCS-GX locally, but without the Singularity image and with a dedicated host. You could try requesting exclusive_host=1 in case the host is also running another job that is pushing the GX database out of memory.

Eric

YanisChrys commented 4 months ago

Hello Eric,

Thank you for your suggestions and for your patience. I managed to do the degugging you suggested, so: 1) Yes that's right. The issue is with the clean genome command 2) I ran free -h on the cluster node that runs the command and here is the result:

total        used        free      shared  buff/cache   available
Mem:           3,0T         59G        2,5T         59M        438G        2,9T
Swap:          5,0G         70M        4,9G

3) I am attaching a file with the report from FCS, please let me know if you see anything in there that might be a red flag. Let me know if you cant open it for some reason fcs.log

4) I am hoping step 3 will tell us what is going on, otherwise I can try running the command again with the option you suggested.

Note: I do not have access to a dedicated host or a UGE system and I am using fcs=0.4.0 and singularity=3.10.5

etvedte commented 4 months ago

Hello,

The output of free -h indicates the server you are running on has enough memory for FCS-GX, but this line in the logfile indicates the process is thrashing/swapping-out: Major (requiring I/O) page faults: 82820955 If there are other high-memory processes running on the host, they could be evicting the FCS-GX database into swap.

I do not have access to a dedicated host or a UGE system

Can you provide more information here? Do you have a job scheduling system on the host? The command exclusive_host=1 tells SGE/UGE to reserve exclusive access to the host before execution, but sometimes requires elevated permissions.

etvedte commented 2 months ago

Closing. Please reopen if you need to discuss further.