kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
96 stars 18 forks source link

Error in importRdata #221

Closed urwahnawaz closed 2 months ago

urwahnawaz commented 6 months ago

Hi,

I'm trying to import my salmon counts to perform the IsoformSwitchAnalyzer analysis, however I keep getting the following error:

Step 1 of 10: Checking data...
Step 2 of 10: Obtaining annotation...
    importing GTF (this may take a while)...
Error in importRdata(isoformCountMatrix = salmonQuant$counts, isoformRepExpression = salmonQuant$abundance,  : 
  The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). 
Either isforoms found in the annotation are not quantifed or vise versa. 
Specifically:
 116223 isoforms were quantified.
 116757 isoforms are annotated.
 Only 115852 overlap.
 371 isoforms quantifed had no corresponding annoation

This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.

If there is no overlap (as in zero or close) there are two options:
 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files).
 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.
     Examples from expression matrix are : ENSMUST00000222795, ENSMUST00000192531, ENSMUST00000126579 
     Examples of annoation are : ENSMUST00000216559, ENSMUST00000138443, ENSMUST00000132460 
     Examples of isoforms which were only found im the quantification are  : ENSMUST00000249256, ENSMUST00000249252, ENSMUST00000179487 

If there is a large overlap but still far from complete there are 3 possibilites:
 1) The files do not fit together (e.g different databases versions etc.) (no fix except using propperly paired files).
 2) If you are using Ensembl data you have supplied the GTF without phaplotyps. You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf
 3) One file could contain non-chanonical chromosomes while the other do not (might be solved using the 'removeNonConvensionalChr' argument.)
 4) It is somthing to do with how a subset of the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.

Fo

I used Salmon to make my count files using the decoy genome approach as mentioned here and basically concatenated gencode transcript annotations with a genome assembly. I'm using the same gencode gtf file. I've used various different genome assemblies to see if the error goes away but then also used the gentrome.fa file I used for Salmon. Nothing has worked. This is the most recent version of IsoformSwitchAnalyzer from Bioconductor that I'm working on.

aSwitchList <- importRdata(
 isoformCountMatrix   = salmonQuant$counts,
   isoformRepExpression = salmonQuant$abundance,
   designMatrix         = design,
   addAnnotatedORFs = TRUE,
   comparisonsToMake = myComparison,
   isoformExonAnnoation ="/path/to/file/gencode.vM29.annotation.gtf",
   isoformNtFasta       = "/path/to/file/gentrome.fa.gz",
  fixStringTieAnnotationProblem = TRUE,
  showProgress = FALSE, 
 ignoreAfterPeriod = TRUE)

I have also read similar issues on GitHub and bioconductor - tried all the troubleshooting advice from there and that has not worked for me. I also did try the Tximeta way of importing the counts but that also didn't work.

Any help would be greatly appreciated! q

chunxubioinfor commented 5 months ago

Hi Urwah, the error indicates that the annotation file is incompatible with the quantification file, and due to the large overlap, there are several possibilities as shown in the error message. So, I don't know the the exact problem is unless I take a look at the file. Would you mind sharing the file you used so that I can reproduce the error? 😀

chunxubioinfor commented 2 months ago

I'm going to close this issue now, but you're welcome to open a new one at any time.😊