natir / yacrd

Yet Another Chimeric Read Detector
MIT License
72 stars 8 forks source link

Obtaining the non-chimeric reads #56

Closed desmodus1984 closed 2 months ago

desmodus1984 commented 2 months ago

Hi,

Sorry for the silly question. I tried sequencing a sample, but since I didn't have enough DNA I used whole-genome amplification to get more DNA. I didn't know at the time, which I regret now, that WGA produces chimeric reads. A group sequenced the same species, but didn't publish the genome, which I found out after me trying to assemble my sample.

Thus, I am interested in the possibility of using that genome assembly - which is very fragmented, to try to find chimeric reads, and then perhaps improve my assembly. I read throught the website and I couldn't find the code to get the non-chimeric reads, or if possible, perhaps to get the non-chimeric and the splitted chimeric reads.

Thank you very much;

natir commented 2 months ago

Yacrd is a tool design to detect chimeric read with self mapping not by map against reference genome.

But for publication I write a script that detect chimeric read map on reference read, you could find this script her.

To use it:

minimap2 {add option match your sequencing technology} {input.ref} {input.reads} > {output.paf}
./found_chimera.py {output.paf}

found_chimera.py script just return the number of chimeric read found compare to reference genome but you can change it easily.

I want to be clear it's not yacrd purpose I just write and use found_chimera script to validate yacrd result, by compare read sequence with I quality genome.

desmodus1984 commented 2 months ago

Hi

Thanks for the info, I will do the self mapping instead. That actually will be more meaningful and useful that doing the reference based. Could you please tell how to retrieve the non-chimeric reads and the splitted chimeric reads?

Thank you very much;

natir commented 2 months ago

https://github.com/natir/yacrd?tab=readme-ov-file#find-chimera