ucsd-ccbb / C-VIEW

This software implements a high-throughput data processing pipeline to identify and charaterize SARS-CoV-2 variant sequences in specimens from COVID-19 positive hosts or environments.
MIT License
9 stars 2 forks source link

For each tree-building run, make multiple phylogenetic trees (all seqs, >70% coverage seqs, >95% coverage seqs) #68

Closed AmandaBirmingham closed 3 years ago

AmandaBirmingham commented 3 years ago

From: Rob Knight Date: Thursday, March 18, 2021 at 4:36 PM Subject: Re: Indel flag and Pangolin calls

[snip]

I think we are keeping the 95% threshold for making the phylogenetic trees that will be considered as containing high-quality sequences, and also making phylogenetic trees that contain all sequences (for technical analysis, not for biological interpretation). Do we also need to make phylogenetic trees with all sequences >70% (anticipating that those below 70% will screw up the alignments so badly that the trees may not even be useful for tracking down issues with problem sequences) or is that not useful? [italics added]

Thanks, Rob

AmandaBirmingham commented 3 years ago

Stakeholder: am I correct that both should be done for every tree-building run? A: yes Stakeholder: did we ever get an answer to italicized question above? A: If easy, advocate doing 3 trees: all, >70%, >95%

AmandaBirmingham commented 3 years ago

@rob-knight Is this requirement based SOLELY on the coverage, or is it based on the 3 categories from issue #58 (which also include a check based on the pangolin qc)?

rob-knight commented 3 years ago

It would make sense to keep it consistent with issue #58.

On Apr 20, 2021, at 10:30 AM, Amanda Birmingham @.***> wrote:

 @rob-knight Is this requirement based SOLELY on the coverage, or is it based on the 3 categories from issue #58 (which also include a check based on the pangolin qc)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rob-knight commented 3 years ago

…however, if it is very hard to keep it consistent with #58, doing it purely based on coverage is an acceptable solution.

On Apr 20, 2021, at 11:21 AM, Rob Knight @.***> wrote:

It would make sense to keep it consistent with issue #58.

On Apr 20, 2021, at 10:30 AM, Amanda Birmingham @.***> wrote:



@rob-knight https://urldefense.com/v3/__https://github.com/rob-knight__;!!Mih3wA!VsFCgQx-fojJ_bmRnTiBolIRDZh5ULa7IJQSFZXTnw6YfmY-pezTrh_R09JXt8JCSA$ Is this requirement based SOLELY on the coverage, or is it based on the 3 categories from issue #58 https://urldefense.com/v3/__https://github.com/ucsd-ccbb/covid_sequencing_analysis_pipeline/issues/58__;!!Mih3wA!VsFCgQx-fojJ_bmRnTiBolIRDZh5ULa7IJQSFZXTnw6YfmY-pezTrh_R09JwpGvuFA$ (which also include a check based on the pangolin qc)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/ucsd-ccbb/covid_sequencing_analysis_pipeline/issues/68*issuecomment-823467783__;Iw!!Mih3wA!VsFCgQx-fojJ_bmRnTiBolIRDZh5ULa7IJQSFZXTnw6YfmY-pezTrh_R09JaLMyIqw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAPL2KMYVZAESYO4A6DJC2TTJW24ZANCNFSM4ZPSPNLQ__;!!Mih3wA!VsFCgQx-fojJ_bmRnTiBolIRDZh5ULa7IJQSFZXTnw6YfmY-pezTrh_R09LRxcH4sg$.