kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
101 stars 18 forks source link

importRdata only outputs a single comparison of given conditions. #21

Closed iamnicogomez closed 5 years ago

iamnicogomez commented 5 years ago

I suspect that this is actually user error, not an issue, but as I am still new to programming, I don't know the preferred place to post this.

I am starting with a Salmon quantified RNA-seq dataset with 3 conditions (3 biological replicates each)

My designMatrix is defined as

designMatrix <- data.frame(
  sampleID = colnames(salmonQuant$abundance)[-1],
  condition = as.character(AnnoTable[match(gsub('_.*', '', colnames(salmonQuant$abundance)[-1]), AnnoTable[,'run']),'Genotype']),
  replicate = AnnoTable[match(gsub('_.*', '', colnames(salmonQuant$abundance)[-1]), AnnoTable[,'run']),'replicate'],
  stringsAsFactors = F
)

which produces this:

     sampleID condition replicate
1 85871_quant       GFP         1
2 85872_quant        WT         1
3 85873_quant        FL         1
4 85876_quant       GFP         2
5 85877_quant        WT         2
6 85878_quant        FL         2
7 85881_quant       GFP         3
8 85882_quant        WT         3
9 85883_quant        FL         3

I also define the comparisons list with the following:

comparisonsToMake = as.data.frame(cbind(condition_1 = c("GFP", "WT",  "GFP"),
                                        condition_2 = c("WT", "FL",  "FL")),
                                  stringsAsFactors =F)

Then I use importRdata:

SwitchList <- importRdata(
  isoformCountMatrix   = salmonQuant$counts,
  isoformRepExpression = salmonQuant$abundance,
  designMatrix = designMatrix,
  isoformExonAnnoation = '/Users/nicog/salmon_BrittanyHEK/cDNA+ncRNA/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf.gz',
  showProgress = TRUE, comparisonsToMake = comparisonsToMake
)

... which gives me the following:

Step 1 of 6: Checking data...
Step 2 of 6: Obtaining annotation...
    importing GTF (this may take a while)
converting annotated CDSs
Step 3 of 6: Calculating gene expression and isoform fraction...
     67167 ( 29.67%) isoforms were removed since they were not expressed in any samples.
Step 4 of 6: Merging gene and isoform expression...
  |=====================================================================================================================| 100%
Step 5 of 6: Making comparisons...
  |=====================================================================================================================| 100%
Step 6 of 6: Making switchAnalyzeRlist object...
Done
Warning message:
In importRdata(isoformCountMatrix = salmonQuant$counts, isoformRepExpression = salmonQuant$abundance,  :

The annotation and quantification (count/abundance matrix and isoform annotation) contain differences in which isoforms are analyzed. 
Specifically:
 226365 isoforms were quantified.
 227492 isoforms are annotated.
 The annotation provided contain: 1127 more isoforms than the count matrix.

Please make sure this is on purpouse since differences will cause inaccurate quantification and thereby skew all analysis.

!NB! All differences were removed from the final switchAnalyzeRlist!

Importantly:

> SwitchList
This switchAnalyzeRlist list contains:
 159198 isoforms from 41593 genes
 3 comparison from 3 conditions

Feature analyzed:
[1] "ORFs"

Then I run isoformSwitchAnalysisPart1:

SwitchList <- isoformSwitchAnalysisPart1(
  switchAnalyzeRlist = SwitchList,
  genomeObject = Hsapiens, # the reference to the human BS genome
  dIFcutoff = 0.3,         # Cutoff for finding switches - set high for short runtime in example data
  outputSequences = TRUE,
  pathToOutput ='/Users/nicog/salmon_BrittanyHEK/cDNA+ncRNA/IsoformSwitchAnalyzeR', overwriteORF = T
)

Which gives me:

Step 1 of 3 : Detecting isoform switches...
Step 2 of 3 : Predicting open reading frames
Step 3 of 3 : Extracting (and outputting) sequences

The number of isoform swithces found were:
  Comparison nrIsoforms nrGenes
1  GFP vs FL         72      61
The nucleotide and amino acid sequences of these isoforms have been outputted to the supplied directory. 
These sequences enabling external analysis of protein domians (Pfam), coding potential (CPAT) or signal peptides (SignalIP). 
See ?analyzeCPAT, ?analyzePFAM or ?analyzeSignalIP (under details) for suggested ways of running these three tools.
Warning messages:
1: In extractSequence(switchAnalyzeRlist = tmpSwitchAnalyzeRlist, genomeObject = genomeObject,  :
  106 transcripts was removed due to them being annotated to chromosomes not found in the refrence genome object
2: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr4, chr22, chrY, chrM, chr1_GL383518v1_alt, chr1_GL383519v1_alt, chr1_GL383520v2_alt, chr1_KI270759v1_alt, chr1_KI270760v1_alt, chr1_KI270761v1_alt, chr1_KI270762v1_alt, chr1_KI270763v1_alt, chr1_KI270764v1_alt, chr1_KI270765v1_alt, chr1_KI270766v1_alt, chr1_KI270892v1_alt, chr2_GL383521v1_alt, chr2_GL383522v1_alt, chr2_GL582966v2_alt, chr2_KI270767v1_alt, chr2_KI270768v1_alt, chr2_KI270769v1_alt, chr2_KI270770v1_alt, chr2_KI270771v1_alt, chr2_KI270772v1_alt, chr2_KI270773v1_alt, chr2_KI270774v1_alt, chr2_KI270775v1_alt, chr2_KI270776v1_alt, chr2_KI270893v1_alt, chr2_KI270894v1_alt, chr3_GL383526v1_alt, chr3_JH636055v2_alt, chr3_KI270777v1_alt, chr3_KI270778v1_alt, chr3_KI270779v1_alt, chr3_KI270780v1_alt, chr3_KI270781v1_alt, chr3_KI270782v1_alt, chr3_KI270783v1_alt, chr3_KI270784v1_alt, chr3_KI270895v1_alt, chr3_KI270924v1_alt, chr3_KI270934v1_alt, chr3_KI270935v1_alt, chr3_KI270936v1_alt, chr3_KI27093 [... truncated]
3: In extractSequence(switchAnalyzeRlist = switchAnalyzeRlist, genomeObject = genomeObject,  :
  106 transcripts was removed due to them being annotated to chromosomes not found in the refrence genome object
4: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr4, chr22, chrY, chrM, chr1_GL383518v1_alt, chr1_GL383519v1_alt, chr1_GL383520v2_alt, chr1_KI270759v1_alt, chr1_KI270760v1_alt, chr1_KI270761v1_alt, chr1_KI270762v1_alt, chr1_KI270763v1_alt, chr1_KI270764v1_alt, chr1_KI270765v1_alt, chr1_KI270766v1_alt, chr1_KI270892v1_alt, chr2_GL383521v1_alt, chr2_GL383522v1_alt, chr2_GL582966v2_alt, chr2_KI270767v1_alt, chr2_KI270768v1_alt, chr2_KI270769v1_alt, chr2_KI270770v1_alt, chr2_KI270771v1_alt, chr2_KI270772v1_alt, chr2_KI270773v1_alt, chr2_KI270774v1_alt, chr2_KI270775v1_alt, chr2_KI270776v1_alt, chr2_KI270893v1_alt, chr2_KI270894v1_alt, chr3_GL383526v1_alt, chr3_JH636055v2_alt, chr3_KI270777v1_alt, chr3_KI270778v1_alt, chr3_KI270779v1_alt, chr3_KI270780v1_alt, chr3_KI270781v1_alt, chr3_KI270782v1_alt, chr3_KI270783v1_alt, chr3_KI270784v1_alt, chr3_KI270895v1_alt, chr3_KI270924v1_alt, chr3_KI270934v1_alt, chr3_KI270935v1_alt, chr3_KI270936v1_alt, chr3_KI27093 [... truncated]
> SwitchList
This switchAnalyzeRlist list contains:
 341 isoforms from 61 genes
 1 comparison from 2 conditions

Switching features:
  Comparison switchingIsoforms switchingGenes
1  GFP vs FL                79             61

Feature analyzed:
[1] "Isoform Swich Identification, ORFs, ntSequence, aaSequence"

Where are the other comparisons I listed in comparisonsToMake? Thank you, Nico

kvittingseerup commented 5 years ago

Hi Nico

Thanks for repporting this problem. Unfortunately there was a but which resulted in this problem. It was fixed in v1.5.1. You can download the latest version by copying the following code into your R session:

if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}
devtools::install_github("kvittingseerup/IsoformSwitchAnalyzeR", build_vignettes = TRUE)

Remember to restart your R session after the installation.