kevinrue / GOexpress-original

Original repository for Bioconductor package. Now at:
https://github.com/kevinrue/GOexpress
9 stars 5 forks source link

GOexpress

Visualise microarray and RNAseq data with gene ontology annotations.

OVERVIEW

This package was designed for the analysis of bioinformatics data based on gene expression measurements. It requires two input values:

  1. an ExpressionSet containing assayData and phenoData. The assayData slot should be a gene-by-sample matrix providing the expression level of genes (rows) in each sample (columns). Row names are expected to be either Ensembl gene identifiers or probeset identifiers present in microarrays present in the Ensembl BioMart dataset queried. The phenoData slot should be an AnnotatedDataFrame from the Biobase package providing phenotypic information about the samples. Row names are samples, at least one of the columns must be a grouping factor with two or more levels (factor in the actual meaning of the R language).
  2. the name of the grouping factor to investigate, which must be a valid column name in the phenoData.

The analysis scores all Gene Ontology (GO) terms represented in the gene annotations provided, or semi-automatically retrieved from the current Ensembl annotation release, using the biomaRt package. In the default approach, the random forest framework is used to evaluate the ability of each gene feature in the ExpressionSet to cluster groups of samples according to a known experimental factor. Notably, genes associated with the GO term in the annotations but absent from the dataset are assigned a score of 0 and a rank equal to the number of gene features in the ExpressionSet plus one. GO terms are scored and ranked on the average rank (alternatively, score) of all associated genes (including those absent from the ExpressionSet).

Functions are provided to investigate and visualise the results of the above analysis. The score table can be filtered for GO terms passing given thresholds. The distribution of scores can be visualised. The quantiles of scores can be obtained. The genes associated with a given GO term can be listed, with or without descriptive information. Hierarchical clustering of the samples can be performed based on the expression levels of genes associated with a given GO term. Heatmaps accompanied by hierarchical clustering of samples and genes can be drawn. The expression profile of genes can be plotted against any factor while grouping samples on another factor. The univariate effect of all factors can be visualised on the expression level of genes associated with a GO term. The counts of overlapping genes between multiple GO terms can be visualised in a Venn diagram. The result variable of the analysis can be re-ordered according to gene rank or score.

FEATURES