kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
100 stars 18 forks source link

ImportRData Ensembl GTF annotation vs Matrix 0 Overlap #228

Closed Blondeau-Bidet closed 7 months ago

Blondeau-Bidet commented 7 months ago

Hello, I am encountering an error when importing data into R. I used STAR + RSEM to quantify my data. A genome and an annotation file are available on Ensembl. I don't understand the error because I can find the same genes and isoforms in my counting matrix and in my annotation file. I've checked several times and the fasta and gtf files used for mapping + quantification are indeed the same as those indicated in the command line under R. Attached is a file with a preview of a count matrix for 1 sample, followed by my annotation file. Overview_counts_annotation.txt

Thank you for your help,

Eva

Here is my command line : aSwitchList_2 <- importRdata( isoformCountMatrix = RSEM_Quant$counts, isoformRepExpression = RSEM_Quant$abundance, designMatrix = myDesign, isoformExonAnnoation = "Dicentrarchus_labrax.dlabrax2021.111.gtf", isoformNtFasta = "Dicentrarchus_labrax.dlabrax2021.dna.toplevel.fa", fixStringTieAnnotationProblem = FALSE, removeNonConvensionalChr = TRUE, ignoreAfterBar = TRUE, ignoreAfterSpace = TRUE, ignoreAfterPeriod = TRUE, showProgress = FALSE )

And here is the error message:

_Erreur dans importRdata(isoformCountMatrix = RSEM_Quant$counts, isoformRepExpression = RSEM_Quant$abundance, : The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). Either isforoms found in the annotation are not quantifed or vise versa. Specifically: 0 isoforms were quantified. 69565 isoforms are annotated. Only 0 overlap. 0 isoforms quantifed had no corresponding annoation

This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.

If there is no overlap (as in zero or close) there are two options: 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files). 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments. Examples from expression matrix are :
Examples of annoation are : ENSDLAT00005029896, ENSDLAT00005056072, ENSDLAT00005077077 Examples of isoforms which were only found im the quantification are :_

Blondeau-Bidet commented 7 months ago

Update: This time I've re-run the analysis with REFseq type data (fasta and GTF) downloaded from NCBI and used Kallisto for quantification with the fasta file of the transcripts.

aSwitchList_2 <- importRdata( isoformCountMatrix = Kallisto_Quant$counts, isoformRepExpression = Kallisto_Quant$abundance, designMatrix = myDesign, isoformExonAnnoation ="D:/Salsa_Intolerants/IsoformSwitchAnalyseR/Genome_2021_ncbi/Dl_2021_ncbi_annotation.gtf", isoformNtFasta = "D:/Salsa_Intolerants/IsoformSwitchAnalyseR/Genome_2021_ncbi/Dl_2021_rna.fa", fixStringTieAnnotationProblem = TRUE, removeNonConvensionalChr = TRUE, ignoreAfterBar = TRUE, ignoreAfterSpace = TRUE, ignoreAfterPeriod = TRUE, showProgress = FALSE )

And I get the same error message. I don't understand the source of the problem.

Erreur dans importRdata(isoformCountMatrix = Kallisto_Quant$counts, isoformRepExpression = Kallisto_Quant$abundance, : The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). Either isforoms found in the annotation are not quantifed or vise versa. Specifically: 0 isoforms were quantified. 64646 isoforms are annotated. Only 0 overlap. 0 isoforms quantifed had no corresponding annoation

This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.

If there is no overlap (as in zero or close) there are two options: 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files). 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments. Examples from expression matrix are :
Examples of annoation are : XM_051383982, XM_051399489, XM_051407153 Examples of isoforms which were only found im the quantification are :

chunxubioinfor commented 7 months ago

Hi Eva, is the gtf file the same as the file you used in RSEM quantification? If so, could you save the RSEM_Quant and send it to me? I guess there's something wrong with this matrix.

Blondeau-Bidet commented 7 months ago

Hello,

Yes, I checked several times that the GTF file was the same. I have 10 RSEM files corresponding to my 10 samples, of the isoforms.results type. Is a single file attachment enough? I saved it in txt format instead of the "RESULT" format to send it, so I hope that doesn't cause any problems. FW_i1_copie.txt

Merci !!

chunxubioinfor commented 7 months ago

Hi Eva, sorry maybe I didn't make it clear. Could you save this object RSEM_Quant via: save(RSEM_Quant, myDesign, file = 'quanti.Rdata)?

Blondeau-Bidet commented 7 months ago

Hello, The mistake is mine, I didn't understand correctly. Here is my RSEM_Quant file.

Thanks a lot for your help!

De : Chunxu Han @.> Envoyé : jeudi 29 février 2024 15:12 À : kvittingseerup/IsoformSwitchAnalyzeR @.> Cc : BLONDEAU-BIDET Eva @.>; Author @.> Objet : Re: [kvittingseerup/IsoformSwitchAnalyzeR] ImportRData Ensembl GTF annotation vs Matrix 0 Overlap (Issue #228)

Hi Eva, sorry maybe I didn't make it clear. Could you save this object RSEM_Quant via: save(RSEM_Quant, myDesign, file = 'quanti.Rdata) ?

— Reply to this email directly, view it on GitHubhttps://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/228#issuecomment-1971228768, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO562QRNK7YWR25ZOLLFAGDYV43MLAVCNFSM6AAAAABDUSNEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRGIZDQNZWHA. You are receiving this because you authored the thread.Message ID: @.**@.>>

Blondeau-Bidet commented 7 months ago

I'm sorry, I lightened the file by leaving only 4 samples but I hadn't modified the design. Here's the corrected version.

De : Chunxu Han @.> Envoyé : jeudi 29 février 2024 15:12 À : kvittingseerup/IsoformSwitchAnalyzeR @.> Cc : BLONDEAU-BIDET Eva @.>; Author @.> Objet : Re: [kvittingseerup/IsoformSwitchAnalyzeR] ImportRData Ensembl GTF annotation vs Matrix 0 Overlap (Issue #228)

Hi Eva, sorry maybe I didn't make it clear. Could you save this object RSEM_Quant via: save(RSEM_Quant, myDesign, file = 'quanti.Rdata) ?

— Reply to this email directly, view it on GitHubhttps://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/228#issuecomment-1971228768, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO562QRNK7YWR25ZOLLFAGDYV43MLAVCNFSM6AAAAABDUSNEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRGIZDQNZWHA. You are receiving this because you authored the thread.Message ID: @.**@.>>

chunxubioinfor commented 7 months ago

Hi Eva, I can not see your files here. Maybe you could just send them to my email: s220311@dtu.dk

chunxubioinfor commented 7 months ago

Hi Eva, I've just reproduced your error and suddenly realized that you are studying the fish not human or mouse. That means you couldn't use the removeNonConvensionalChr = TRUE which regards the non-human or mouse chromosomes as non-conventional and remove them. I also tried to set removeNonConvensionalChr = FALSE and it went very well. Hope this can help you. 😊

Blondeau-Bidet commented 7 months ago

Hi, It works, I just had to modify the removeNonConvensionalChr argument.

Thank you so much !!

Eva

De : Chunxu Han @.> Envoyé : samedi 2 mars 2024 14:54 À : kvittingseerup/IsoformSwitchAnalyzeR @.> Cc : BLONDEAU-BIDET Eva @.>; Author @.> Objet : Re: [kvittingseerup/IsoformSwitchAnalyzeR] ImportRData Ensembl GTF annotation vs Matrix 0 Overlap (Issue #228)

Hi Eva, I've just reproduced your error and suddenly realized that you are studying the fish not human or mouse. That means you couldn't use the removeNonConvensionalChr = TRUE which regards the non-human or mouse chromosomes as non-conventional and remove them. I also tried to set removeNonConvensionalChr = FALSE and it went very well. Hope this can help you. 😊

— Reply to this email directly, view it on GitHubhttps://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/228#issuecomment-1974802441, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO562QVKC7LYQFQXH54NUMLYWHKW5AVCNFSM6AAAAABDUSNEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUHAYDENBUGE. You are receiving this because you authored the thread.Message ID: @.**@.>>