tgen / phoenix

Jetstream compatible workflow template supporting comprehensive analysis of human sequencing data against GRCh38
MIT License
17 stars 6 forks source link

Update Octopus and Enable --annotations #62

Closed PedalheadPHX closed 5 years ago

PedalheadPHX commented 5 years ago

https://github.com/tgen/phoenix/blob/657be197a850300980b8b5e4e5cd9586981284d9/modules/somatic/octopus.jst#L14

It looks like we are still using 0.6.0 when the AD and AF updates were in 0.6.1 and now its on 0.6.3. Chirstophe have we tested these newer versions?

also it doesn't look like we are using: --annotations AD ADP AF

ryanrichholt commented 5 years ago

That's an easy change. After I add it, should I retry including this vcf in vcfmerger?

ChristopheLegendre commented 5 years ago

What is the point of updating Octopus to the latest version and spend time on that as we decided to exclude this tool from our the list of somatic variant caller?

exclusion at least for now ... why:

Argument #1: AD and AF from --annotations

I have not seen in the release notes any modification about the flags AD, ADP and AF. So, Having the version 0.6.1 or 0.6.3 will not change the facts:

1) that AD is not to VCF specs [only one integer where the VCF specs expects two integer values representing ref and alt allele depth used for variant calling ] 2) the most IMPORTANT point : NO genotype information is present in the Normal sample when options --annotations and --filter-somatic, which makes it impossible to calculate AD and AF for the Normal sample. 3) we could go back to the BAM file and extract some information to get AD and AF, but the values will most probably NOT match the statistics already calculated by Octopus because the stats values will also come from different ALT values due to local realignment

Argument #2: Using Octopus' Random Forest filtering to get AD and AF too stringent

1) After testing only twice the ramdom filtering for Somatic calls using a MMRF sample and the Ashion COLO829 sample, we saw that the filtering was really stringent and most of the expected calls got filtered out; But this requires further analysis using the TRUTH set of COLO829 or any other reference.

IMPORTANT NOTE: Octopus offers a way to output an "UNFILTERED vcf" ; In that case, OCtopus output AD, ADP and AF and gives values for both Tumor and Normal samples, But annotated ALL the variant as PASS; So in this case, we cannot filter the vcf on "PASS" before preparing the vcf for vcfMerger2.

conclusion

As of now, we should exclude Octopus from our list of somatic variant caller and replace it with VarDictJava.

What to do if we really want to keep it:

1) update to latest version even though this won't change anything from current version as nothing new has been updated in Octopus regarding somatic calls 2) we must choose the most "beneficial" condition between no-filtering, --annotations, --random-forest-somatic

PedalheadPHX commented 5 years ago

Not suggesting we implement it BUT

1) I have not seen a summary to CLEARLY understand what is going on exactly 2) Ryan was testing with 0.6.0 but it sounds like you tested a newer version so why are we using different versions?

On Mon, Apr 22, 2019 at 1:53 PM Chris notifications@github.com wrote:

What is the point of updating Octopus to the latest version and spend time on that as we decided to exclude this tool from our the list of somatic variant caller?

exclusion at least for now ... why: Argument #1: AD and AF from --annotations

I have not seen in the release notes any modification about the flags AD, ADP and AF. So, Having the version 0.6.1 or 0.6.3 will not change the facts:

  1. that AD is not to VCF specs [only one integer where the VCF specs expects two integer values representing ref and alt allele depth used for variant calling ]
  2. the most IMPORTANT point : NO genotype information is present in the Normal sample when options --annotations and --filter-somatic, which makes it impossible to calculate AD and AF for the Normal sample.
  3. we could go back to the BAM file and extract some information to get AD and AF, but the values will most probably NOT match the statistics already calculated by Octopus because the stats values will also come from different ALT values due to local realignment

Argument #2: Using Octopus' Random Forest filtering to get AD and AF too stringent

  1. After testing only twice the ramdom filtering for Somatic calls using a MMRF sample and the Ashion COLO829 sample, we saw that the filtering was really stringent and most of the expected calls got filtered out; But this requires further analysis using the TRUTH set of COLO829 or any other reference.

NOTE: Octopus offers a conclusion

As of now, we should exclude Octopus from our list of somatic variant caller and replace it by VarDictJava. What to do if we really want to keep it:

  1. update to latest version even though this won't change anything from current version as nothing new has been updated in Octopus regarding somatic calls
  2. we must choose the most "beneficial" condition between no-filtering, --annotations, --random-forest-somatic

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tgen/phoenix/issues/62#issuecomment-485548841, or mute the thread https://github.com/notifications/unsubscribe-auth/AC6VCHE2YL5SIXE3O3JGXUDPRYQWPANCNFSM4HHFRH7Q .

-- Jonathan Keats, Ph.D. Assistant Professor & Director of Bioinformatics Translational Genomics Research Institute (TGen) jkeats@tgen.org | www.keatslab.org (W) 602-343-8690 | (M) 480-543-0634

--

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.

ChristopheLegendre commented 5 years ago

@PedalheadPHX 1) I tried to summarize the possibilities above. 2) I have not tested version 0.6.3-beta. I tried 0.6.2-beta because the author mentioned the following Corrects measures AD and AF calculations which I thought would solve our issue, but that is not the case.

ryanrichholt commented 5 years ago

Closing this issue as I believe the latest decision is to remove Octopus from the vcfMerger sources. I've left the code in place so that it can be used to call variants, but is not included in the vcfMerger set.