tjiangHIT / cuteSV

Long read based human genomic structural variation detection with cuteSV
MIT License
245 stars 36 forks source link

Q: Detection SV by aligning to diploid (not haplotype resolved) genome #142

Open Alteroldis opened 6 months ago

Alteroldis commented 6 months ago

Hi Dr Jiang.

I work with species, genome of which has a high number of genome rearrangements. Because of that I can assembly only diploid version of genome with Flye. And I think, I can resolve these by breaking my reads at points of structural variations and assembly them again. Will you approach work if I align reads to diploid genome, not haploid? And may be this resolve contigs of genome to different alleles (haplotypes)? Could I retrieve points of SV for my reads from output of your tool?

tjiangHIT commented 6 months ago

Hello @Alteroldis,

This is a very interesting question. cuteSV can identify the breakend which enrolled in two different chromosomes or different haplotypes of homologous chromosomes. Also, cuteSV can report the read ID that supports the breakend event. So I guess cuteSV can help your purpose in this circumstance. You can use minimap2 to align long-reads to the diploid genome, and then run cuteSV.

Best, Tao

Alteroldis commented 6 months ago

Dear Dr Jiang, thank you for quick answer. I think it makes sense to remove reads that give exactly translocation events, and use the remaining ones for assembly. But, since I have both haplotypes in the assembly and it is unknown which two contigs belong to homologous chromosomes, a problem arises. Let's say reads 1-10 support translocations between contigs A and B. Then there will be another translocation event between contigs B and A with reads 11-21. Is everything correct? By deleting reads 1-21, I will lose part of the genome. And it seemed strange to me that there was a huge translocation event in the logs, but only about 2000 remained in vcf. Perhaps it’s worth tweaking some settings?