umccr / cwl-ica

A collection of cwl-ica workflows along with a user guide for the commands to use and contributions guide
MIT License
8 stars 2 forks source link

SpiceAI Nirvana #577

Open minw2828 opened 2 weeks ago

minw2828 commented 2 weeks ago

To show SpliceAI results in DRAGEN output, Nirvana is required. ref

I don't see Nirvana being mentioned in this repo.

I heard the current annotation approach is done outside dragen? Will need to find out how people want to go about this.

alexiswl commented 2 weeks ago

Looks like it can be done with dragen alongside the variant calling https://help.dragen.illumina.com/product-guides/dragen-v4.3/nirvana#annotate-files-via-dragen-command-line

Just need to provide the nirvana reference as an input, ideally we'd tarball this and then extract inside the commandlinetool, which I can help with

minw2828 commented 2 weeks ago

Nirvana provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs).

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF).

ref Their annotation step is also the downstream of variant calling.

Concern 1: The output is json. I assume it won't be able to directly integrate with whatever existing downstream tool to produce the html report?

Concern 2: The annotation resources listed here may be older versions than currently used, although we still need to check what the real annotations are in /opt/dragen/<VERSION>/share/nirvana.

Concern 3: The annotation resources listed here look rare diseases focused, . For example, cosmic is not there. although here says cosmic is included too. Need to find out what cancer-related annotations still need to be included.

This is not a priority before the end of this year. Thank you for being proactive. 🙏

minw2828 commented 2 weeks ago

Just continue to put down some notes here.

For a recent release, UK Biobank successfully annotated their entire dataset of 500,000 whole-genome multi-sample variant call files in approximately 90 minutes.

If applicable, it also adds associated Cancer Hotspots annotation.

The data sources supported are listed in Table 1 and divided into two tiers: Premium and Basic.

There is no charge for the use of Premium-tier data sources within Illumina Connected Annotations for DRAGEN users.

ref