vtsyvina / CliqueSNV

MIT License
21 stars 5 forks source link

-t and -tf parameters #21

Open AndreaAguadoM opened 2 years ago

AndreaAguadoM commented 2 years ago

Hello, my name is Andrea and I am a bioinformatician who is currently working with SARS-CoV-2 sequences in order to identify quasispecies within given samples. I have been trying to use CliqueSNV so as to achieve this goal, but I've been having some issues. When selecting the -tf parameter, I do not completely understand its relation with the -t parameter. I have understood it is related with the coverage, meanwhile -t is related to frequency. The issue resides in the fact that have been using this tool with -t value set in 0.01 and the -tf value set in different values (from 20 to 150) but I do not get much better results when changing this last value. Obtained sequences seem not to differ much in the different experiments, so this is been such an issue within my study.

Could someone help me?

Thanks in advance!!

jpflorido commented 2 years ago

Same here, do not understand the purpose of -tf argument when used together with -t Thanks! Javier

Sergey-Knyazev commented 1 year ago

Dear Andrea and Javier,

These two parameters work similarly, however, -t is crucial for genome fragments with low coverage and is required for reliability of detection of SNV pairs. The parameter -tf is introduced for some of our users, who asked to restrict the output of rare variants.

Thank you! Sergey

AndreaAguadoM commented 1 year ago

Hi Sergey, thank you for your answer!

I am going to take this opportunity and ask you about other issues we had while using CliqueSNV. Recently, I talked to Viachaslau Tsyvina, and he told me about the issue the tool had when working with SARS-CoV-2 data obtained with Illumina tech (since mutations tend to be distributed pretty uniformly across the genome). Specifically, we were trying to identify quasispecies in SARS-CoV-2 for our study, since we have some really interesting samples belonging to infected people with strange symptoms. Thereby, we do obtain different viral quasispecies as a result and they match with prior results we obtained. The problem resides in the fact that generated quasispecies sequences seem not to have all the mutations we identified in a previous vcf we generated, and we think the main reason is because we are using Illumina tech in order to perform the sequencing step. Is there any chance you have proven another tool which actually works with SARS-CoV-2 illumina data?

Thanks in advance!

El mar, 12 jul 2022 a las 0:22, Sergey Knyazev @.***>) escribió:

Dear Andrea and Javier,

These two parameters work similarly, however, -t is crucial for genome fragments with low coverage and is required for reliability of detection of SNV pairs. The parameter -tf is introduced for some of our users, who asked to restrict the output of rare variants.

Thank you! Sergey

— Reply to this email directly, view it on GitHub https://github.com/vtsyvina/CliqueSNV/issues/21#issuecomment-1180928296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW4354UJ3EJCVKFMCML4D2LVTSNDPANCNFSM5ZCBKPBA . You are receiving this because you authored the thread.Message ID: @.***>

Sergey-Knyazev commented 1 year ago

Hi Andrea,

To get the mutations, you could use --snv-illumina-vc option. But take in mind that it calls the mutations that appear in the same reads simultaneously by pairs (see the article Picture 1, Step 1). The method gives high reliability of calling such mutations even if their frequencies are bellow the error rate (see Methods). For other mutations, you could use samtools, but if you would like to call them reliably, you could want to use high threshold value. You probably would also like to pay attention for the quality of the alignment because we saw that errors in alignment bring detection of false positive mutations.

Thank you! Sergey