vdblab / vdblab-shotgun

Shotgun metagenomic sequencing processing pipeline
MIT License
1 stars 1 forks source link

bowtie_human jobs killed due to hitting memory limit #63

Open funnell opened 1 year ago

funnell commented 1 year ago

Somehow some experiments are using up the full 24Gb allocated to this rule. Not sure if it's from the bowtie2 (which claims to be memory efficient!) or the SAMtools view part. (e.g. analyses: 13736, 13993)

nickp60 commented 1 year ago

hmm, looking into it. Looks like we have sortmerna timing out too

nickp60 commented 1 year ago

Looks like its due to input file size, I'm seeing input files with 175M read pairs haha. The simplest option (I think) is to rerun with a higher number of shards. What do you think of having isabl set the number of shards according to file size? I'm thinking a shard per GB input fastq (looking at one of the pair), minimum of 2. So instead of the default 4 this would run with 13 shards?

funnell commented 1 year ago

Would we also be able to reduce the requested memory per shard under that plan?

nickp60 commented 1 year ago

We could, but if we are hitting the limit with 24 we might want to leave it as-is. I played a bit with setting resources dynamically here https://github.mskcc.org/vdblabinternal/isabl_microbiome_apps/blob/237493f19fe4f463c0a90caf5cfd59d0019e1805/isabl_microbiome_apps/apps/shotgun/biobakery.py#L286, if we want to go that route.

funnell commented 1 year ago

sure, I was thinking it might be hard to schedule a lot of jobs asking for 24G. If we are constraining a shard to be at most 1G then we might have a reasonable sense of the memory requirements