kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
100 stars 18 forks source link

Issue using importRdata #200

Open lauralh5 opened 1 year ago

lauralh5 commented 1 year ago

Hello,

I'm using IsoformSwitchAnalyzeR version 2.1.2 and I'm having some trouble importing the data using importRdata. My data comes from Isoquant and I'm using both counts and tpm tables.

> transcript.SwitchList <- importRdata(
+   isoformCountMatrix   = transcripts$transcript_counts,
+   isoformRepExpression = transcripts$transcript_tpm,
+   designMatrix = design.table,
+   isoformExonAnnoation = "data/9-22/genome_annotation/gencode.v43.annotation.gtf",
+   isoformNtFasta = "data/9-22/genome_annotation/GRCh38.primary_assembly.genome.fa",
+   showProgress = FALSE, 
+  removeNonConvensionalChr = TRUE
+ )
Step 1 of 10: Checking data...
Step 2 of 10: Obtaining annotation...
    importing GTF (this may take a while)...
Error in importRdata(isoformCountMatrix = transcripts$transcript_counts,  : 
  The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). 
Either isforoms found in the annotation are not quantifed or vise versa. 
Specifically:
 79812 isoforms were quantified.
 159957 isoforms are annotated.
 Only 79812 overlap.
 0 isoforms quantifed had no corresponding annoation

This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.

If there is no overlap (as in zero or close) there are two options:
 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files).
 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.
     Examples from expression matrix are : ENST00000581865.1, ENST00000399966.9_PAR_Y, ENST00000407642.6 
     Examples of annoation are : ENST00000565743.1, ENST00000498223.3, ENST00000441048.1 
     Examples of isoforms which were only found im the quantification are  :  

If there is a large overlap but still far from complete there are 3 possibilites:
 1) The files do not fit together (e.g different databases versions etc.) (no fix except using propperly paired files).
 2) If you are using Ensembl data you have supplied the GTF without phaplotyps. You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf
 3) One file could contain non-chanonical chromosomes while the other do not (might be solved using the 'removeNonConvensionalChr' argument.)
 4) It is somthing to do with how a subset of the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.

For more info see the FAQ in the vignette.

At first I thought it would be an issue with the number of quantified and annotated transcripts, but after loosening the isoquant analysis, I still get the same error even though the number of quantified and annotated isoform are pretty similar.


> transcript.SwitchList <- importRdata(
+   isoformCountMatrix   = transcripts$transcript_counts,
+   isoformRepExpression = transcripts$transcript_tpm,
+   designMatrix = design.table,
+   isoformExonAnnoation = "data/9-22/genome_annotation/gencode.v43.annotation.gtf",
+   isoformNtFasta = "data/9-22/genome_annotation/GRCh38.primary_assembly.genome.fa",
+   showProgress = FALSE, 
+  removeNonConvensionalChr = TRUE
+ )
Step 1 of 10: Checking data...
Step 2 of 10: Obtaining annotation...
    importing GTF (this may take a while)...
Error in importRdata(isoformCountMatrix = transcripts$transcript_counts,  : 
  The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). 
Either isforoms found in the annotation are not quantifed or vise versa. 
Specifically:
 119533 isoforms were quantified.
 167047 isoforms are annotated.
 Only 119533 overlap.
 0 isoforms quantifed had no corresponding annoation

This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.

If there is no overlap (as in zero or close) there are two options:
 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files).
 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.
     Examples from expression matrix are : ENST00000559647.2, ENST00000442065.5, ENST00000665054.1 
     Examples of annoation are : ENST00000678412.1, ENST00000526637.1, ENST00000414428.2 
     Examples of isoforms which were only found im the quantification are  :  

If there is a large overlap but still far from complete there are 3 possibilites:
 1) The files do not fit together (e.g different databases versions etc.) (no fix except using propperly paired files).
 2) If you are using Ensembl data you have supplied the GTF without phaplotyps. You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf
 3) One file could contain non-chanonical chromosomes while the other do not (might be solved using the 'removeNonConvensionalChr' argument.)
 4) It is somthing to do with how a subset of the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.

For more info see the FAQ in the vignette.

Any ideas will be much appreciated. Thanks a lot in advance for the help!

skudashev commented 1 year ago

Hello, I have the same issue, and the same isoformExonAnnoation and isoformNtFasta files worked fine when used with an older version of IsoformSwitchAnalyzer (1.17).

Lu-Wang-05 commented 1 year ago

Hello,I have the same issue, Step 1 of 7: Checking data... Step 2 of 7: Obtaining annotation... importing GTF (this may take a while)... Error in createSwitchAnalyzeRlist(isoformFeatures = myIsoAnot, exons = myExons, : The isoform_id in isoformFeatures and exons does not match