Closed poddarharsh15 closed 3 months ago
Hi @poddarharsh15, It looks like a lot of reads are being analysed - 363 million, is that expected? You could have a look at the bam file generated from the fetch stage in the working directory. If you index it with samtools, you will be able to inspect in in IGV or GW. Perhaps there are lots of adapter sequences, or soft-clipped reads in there? That could cause problems
I hope that's not the problem because adapter has been removed already and after that I removed also the duplicates using samtools. I will check it again last time when I ran Dysgu on the same samples without removing duplicates it didn't give me this problem.
Hi @kcleal I have checked other samples on which I have ran dysgu with similar reads count, please see examples below
bwamem=502504302
dragen=503773618
minimap=502983985
Please let me know if you can provide some advice to make changes.
Thanks in advance.
Hi @poddarharsh15,
The examples you provide, dysgu fetch stage of the pipeline only found 3 or 4 million reads to analyse. However the problem sample identified 363 million reads, so 100x more. That indicates something might be going wrong, either there is a mapping issue or there are lots of soft-clipped reads. I recommend checking in IGV or GW to see if you can spot any issues
I am investigating these two flowcells again I will get back to you ASAP. Thanks for the help appreciate it.
Seems like I have found the issue why this was happening because after removal of duplicates (using samtools: a bit strange to see this problem ) the properly mapped read % got extremely lower :( Please the images below for the reference many thanks again.
Ill close this for now
Hi @kcleal I am trying to run a bam files and after running for couple of minutes it gets KILLED, Could you please suggest some ideas I don't understand what's happening haven't seen dysgu doing this before :( Thanks in advance.