vtsyvina / CliqueSNV

MIT License
21 stars 5 forks source link

Haplotype from assembled reads #5

Closed amirshams84 closed 3 years ago

amirshams84 commented 3 years ago

Hi, is there any way to generate haplotype from assembled reads using cliqueSNV

Thanks

vtsyvina commented 3 years ago

What do you mean? Just find consensus? There is an option for this, although even samtools can do it too. What kind or reads do you have, in what format, what haplotype do you expect to get? Some more info can help here

amirshams84 commented 3 years ago

I have fastq reads from HiV study, my main goal is to generate haplotype and quasispecies I am following these route: A)standard 1) filter reads 2) map to reference 3) cliqueSNV(it works here :) )

B) assemble-based 1) filter reads 2)remove the host 3)assemble reads using spades 4) align to ref 5) detect haplotype or build a consensus sequence(does clique SNV works here??)

C) ref-free 1) filter reads 2) remove the host 3) detect haplotype(does cliqueSNV works here??)

vtsyvina commented 3 years ago

CliqueSNV works only with sam and bam. So you have to align reads one way or another. SO 2) I'm not familiar with spades, but if you align to ref then it should work. 3) no, we don't work reference-free, reads have to be aligned

amirshams84 commented 3 years ago

Thanks for quick reply, I do appreciate it the issue is that since these assembled reads are low in number the read count for bam is very low like 3 reads total when I try cliqueSNV on this bam it gives me this error `CliqueSNV version: 1.5.3.3 Settings: {-m=snv-illumina, -in=try.bam} Reads number 3 SNV got 0 haplotypes

[] CliqueSNV didn't find any haplotypes (too low coverage) time,ms 383`

vtsyvina commented 3 years ago

Ah, I see now.

The only thing is then to find consensus (it is described in README for "-m" parameter). CliqueSNV won't be able to do anything else with such input.

amirshams84 commented 3 years ago

generally, do you recommend to remove duplicate reads from bam using picard markduplicate or samtools rmdup

vtsyvina commented 3 years ago

I'm not sure about the nature of duplicate reads(didn't work closely with obtaining data from raw data), but I assume they may skew the frequencies of haplotypes if their distribution is not uniform. So they may be removed because of this reason

amirshams84 commented 3 years ago

Thank you so much for the quick reply Another question is what is the best way to annotate this generated haplotype? my idea is degap generated haplotype, append with reference, and the multiple align and a dendrogram do you have a better idea?

vtsyvina commented 3 years ago

This question is out of my competence