Open funnell opened 1 year ago
hmm, looking into it. Looks like we have sortmerna timing out too
Looks like its due to input file size, I'm seeing input files with 175M read pairs haha. The simplest option (I think) is to rerun with a higher number of shards. What do you think of having isabl set the number of shards according to file size? I'm thinking a shard per GB input fastq (looking at one of the pair), minimum of 2. So instead of the default 4 this would run with 13 shards?
Would we also be able to reduce the requested memory per shard under that plan?
We could, but if we are hitting the limit with 24 we might want to leave it as-is. I played a bit with setting resources dynamically here https://github.mskcc.org/vdblabinternal/isabl_microbiome_apps/blob/237493f19fe4f463c0a90caf5cfd59d0019e1805/isabl_microbiome_apps/apps/shotgun/biobakery.py#L286, if we want to go that route.
sure, I was thinking it might be hard to schedule a lot of jobs asking for 24G. If we are constraining a shard to be at most 1G then we might have a reasonable sense of the memory requirements
Somehow some experiments are using up the full 24Gb allocated to this rule. Not sure if it's from the bowtie2 (which claims to be memory efficient!) or the SAMtools view part. (e.g. analyses: 13736, 13993)