kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
88 stars 10 forks source link

Process_KILLED #86

Closed poddarharsh15 closed 3 months ago

poddarharsh15 commented 3 months ago

Hi @kcleal I am trying to run a bam files and after running for couple of minutes it gets KILLED, Could you please suggest some ideas I don't understand what's happening haven't seen dysgu doing this before :( Thanks in advance.

Screenshot from 2024-03-21 16-45-09

kcleal commented 3 months ago

Hi @poddarharsh15, It looks like a lot of reads are being analysed - 363 million, is that expected? You could have a look at the bam file generated from the fetch stage in the working directory. If you index it with samtools, you will be able to inspect in in IGV or GW. Perhaps there are lots of adapter sequences, or soft-clipped reads in there? That could cause problems

poddarharsh15 commented 3 months ago

I hope that's not the problem because adapter has been removed already and after that I removed also the duplicates using samtools. I will check it again last time when I ran Dysgu on the same samples without removing duplicates it didn't give me this problem.

poddarharsh15 commented 3 months ago

Hi @kcleal I have checked other samples on which I have ran dysgu with similar reads count, please see examples below bwamem=502504302 dragen=503773618 minimap=502983985 Screenshot from 2024-03-22 11-53-38

Screenshot from 2024-03-22 11-54-09 Please let me know if you can provide some advice to make changes. Thanks in advance.

kcleal commented 3 months ago

Hi @poddarharsh15,

The examples you provide, dysgu fetch stage of the pipeline only found 3 or 4 million reads to analyse. However the problem sample identified 363 million reads, so 100x more. That indicates something might be going wrong, either there is a mapping issue or there are lots of soft-clipped reads. I recommend checking in IGV or GW to see if you can spot any issues

poddarharsh15 commented 3 months ago

I am investigating these two flowcells again I will get back to you ASAP. Thanks for the help appreciate it.

poddarharsh15 commented 3 months ago

Seems like I have found the issue why this was happening because after removal of duplicates (using samtools: a bit strange to see this problem ) the properly mapped read % got extremely lower :( Please the images below for the reference many thanks again.

Screenshot from 2024-03-22 13-25-54

kcleal commented 3 months ago

Ill close this for now