Demultiplexing step taking very long

BournSupremacy commented 9 months ago

Hi Job!

I'm currently trying to run a NovaSeq6000 dataset through the pipeline but I am getting stuck at the demultiplexing step where the experiments are demuxed in 12 different parts. Currently, the job resources for each part are set to 2.5 days, 24 GB, 4 nodes. However, the jobs are running extremely slowly. Based on file sizes from the raw_reads dir, only 1/10 of the sequences have been demuxed in 20 hours, so I'm sure the jobs will timeout at 2.5 days. Is this expected or abnormal? Can it be fixed by changing the resources? Or is there potentially something wrong with the data? It is also a hashed dataset, so we are also using the hashing part of the pipeline. Could this also be a problem? Any thoughts would be appreciated!

J0bbie commented 9 months ago

Hi Jess,

Sorry for the late response as I was on holidays last week. That is not typical behavior and the resources seem more then sufficient to run the demuxxing step.

After the initial bcl2fastq step (depending on the size of sequencing), it shouldn't take more than 3-4 hours for most datasets to be demuxxed/aligned etc.

Are you sure the barcode file is correct? Rescuing the barcodes takes most of the time, still shouldn't be so slow though.. I'm going to incorporate a few more changes this week and will also update some of the underlying code, it might be best to check out that new version.

J0bbie commented 9 months ago

Hi Jess,

Could you try with the new release and see if it works now? It should be a whole lot faster now.

Best,

Job

BournSupremacy commented 8 months ago

Hi Job,

Really sorry for the lack of response, the holidays got in the way! I am currently running some resource intensive jobs, but as soon as I have some space I'll try again with the new release and let you know how it goes. Should be in the next few weeks. Thanks again!

Best, Jess

odomlab2 / sci-rocket

Demultiplexing step taking very long #11