xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
26 stars 17 forks source link

STAARpipeline_Gene_Centric_Noncoding #54

Closed samreenzafer closed 8 months ago

samreenzafer commented 8 months ago

Hi Xihao,

1) I see a lot of these warnings in my analysis and my cohort is very small, ~480 samples.
Would it be safe to use single.strand.genes.only=FALSE to increase the # variants in a gene ?

Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub, rare_maf_cutoff = rare_maf_cutoff,  : 
  Number of rare variant in the set is less than 2!
  228 genes were dropped because they have exons located on both strands
  of the same reference sequence or on more than one reference sequence,
  so cannot be represented by a single genomic range.
  Use 'single.strand.genes.only=FALSE' to get all the genes in a
  GRangesList object, or use suppressMessages() to suppress this message.

2) Also following up to this issue https://github.com/xihaoli/STAARpipeline/issues/12 for STAARpipeline_Gene_Centric_Coding

(2) Perform a "conventional" gene-based analysis where all variants, regardless of their functional annotations, will be included in the set. In such a case, you only have one variant set per gene. When defining a variant set, you can perform a union of plof, missense, etc to form one larger set.

Can you give example code on how to make our own variant set which is a Union of two or more annotations? We are specifically interested in rare variant analysis using a variants list which is a union of splice variants + Loss of function, Clinvar pathogenic or likely pathogenic + predicted Damaging missense variants.

Thank You.

xihaoli commented 8 months ago

Hi @samreenzafer,

Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub, rare_maf_cutoff = rare_maf_cutoff,  : 
  Number of rare variant in the set is less than 2!

As noted in #14 and #51, these messages are not errors from your scripts, but they serve as indications of certain functional categories of genes that do not have at least 2 rare variants to form a variant set.

228 genes were dropped because they have exons located on both strands
  of the same reference sequence or on more than one reference sequence,
  so cannot be represented by a single genomic range.
  Use 'single.strand.genes.only=FALSE' to get all the genes in a
  GRangesList object, or use suppressMessages() to suppress this message.

This message is also innocuous, and you don't need to change it.

Can you give example code on how to make our own variant set which is a Union of two or more annotations?
We are specifically interested in rare variant analysis using a variants list which is a union of splice variants + Loss of function, Clinvar pathogenic or likely pathogenic + predicted Damaging missense variants.

Are all of the annotations to define your own variant set already in the aGDS file? If so, the most straightforward way is to (1) fork the STAARpipeline and STAARpipelineSummary packages; (2) make your own function (e.g. using pLoF.R as a template and update line 58 - 66. If not, you may expand the aGDS file by building new annotations into it. In this case, please make sure to align each variant with your new annotations correctly.

Hope this helps.

Best, Xihao

samreenzafer commented 8 months ago

Thank you so much for replying to both questions. I will try out the guidelines for creating my own function.