Closed Blondeau-Bidet closed 7 months ago
Update: This time I've re-run the analysis with REFseq type data (fasta and GTF) downloaded from NCBI and used Kallisto for quantification with the fasta file of the transcripts.
aSwitchList_2 <- importRdata( isoformCountMatrix = Kallisto_Quant$counts, isoformRepExpression = Kallisto_Quant$abundance, designMatrix = myDesign, isoformExonAnnoation ="D:/Salsa_Intolerants/IsoformSwitchAnalyseR/Genome_2021_ncbi/Dl_2021_ncbi_annotation.gtf", isoformNtFasta = "D:/Salsa_Intolerants/IsoformSwitchAnalyseR/Genome_2021_ncbi/Dl_2021_rna.fa", fixStringTieAnnotationProblem = TRUE, removeNonConvensionalChr = TRUE, ignoreAfterBar = TRUE, ignoreAfterSpace = TRUE, ignoreAfterPeriod = TRUE, showProgress = FALSE )
And I get the same error message. I don't understand the source of the problem.
Erreur dans importRdata(isoformCountMatrix = Kallisto_Quant$counts, isoformRepExpression = Kallisto_Quant$abundance, : The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). Either isforoms found in the annotation are not quantifed or vise versa. Specifically: 0 isoforms were quantified. 64646 isoforms are annotated. Only 0 overlap. 0 isoforms quantifed had no corresponding annoation
This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.
If there is no overlap (as in zero or close) there are two options:
1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files).
2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.
Examples from expression matrix are :
Examples of annoation are : XM_051383982, XM_051399489, XM_051407153
Examples of isoforms which were only found im the quantification are :
Hi Eva, is the gtf file the same as the file you used in RSEM quantification? If so, could you save the RSEM_Quant
and send it to me? I guess there's something wrong with this matrix.
Hello,
Yes, I checked several times that the GTF file was the same. I have 10 RSEM files corresponding to my 10 samples, of the isoforms.results type. Is a single file attachment enough? I saved it in txt format instead of the "RESULT" format to send it, so I hope that doesn't cause any problems. FW_i1_copie.txt
Merci !!
Hi Eva, sorry maybe I didn't make it clear. Could you save this object RSEM_Quant
via: save(RSEM_Quant, myDesign, file = 'quanti.Rdata)
?
Hello, The mistake is mine, I didn't understand correctly. Here is my RSEM_Quant file.
Thanks a lot for your help!
De : Chunxu Han @.> Envoyé : jeudi 29 février 2024 15:12 À : kvittingseerup/IsoformSwitchAnalyzeR @.> Cc : BLONDEAU-BIDET Eva @.>; Author @.> Objet : Re: [kvittingseerup/IsoformSwitchAnalyzeR] ImportRData Ensembl GTF annotation vs Matrix 0 Overlap (Issue #228)
Hi Eva, sorry maybe I didn't make it clear. Could you save this object RSEM_Quant via: save(RSEM_Quant, myDesign, file = 'quanti.Rdata) ?
— Reply to this email directly, view it on GitHubhttps://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/228#issuecomment-1971228768, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO562QRNK7YWR25ZOLLFAGDYV43MLAVCNFSM6AAAAABDUSNEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRGIZDQNZWHA. You are receiving this because you authored the thread.Message ID: @.**@.>>
I'm sorry, I lightened the file by leaving only 4 samples but I hadn't modified the design. Here's the corrected version.
De : Chunxu Han @.> Envoyé : jeudi 29 février 2024 15:12 À : kvittingseerup/IsoformSwitchAnalyzeR @.> Cc : BLONDEAU-BIDET Eva @.>; Author @.> Objet : Re: [kvittingseerup/IsoformSwitchAnalyzeR] ImportRData Ensembl GTF annotation vs Matrix 0 Overlap (Issue #228)
Hi Eva, sorry maybe I didn't make it clear. Could you save this object RSEM_Quant via: save(RSEM_Quant, myDesign, file = 'quanti.Rdata) ?
— Reply to this email directly, view it on GitHubhttps://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/228#issuecomment-1971228768, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO562QRNK7YWR25ZOLLFAGDYV43MLAVCNFSM6AAAAABDUSNEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRGIZDQNZWHA. You are receiving this because you authored the thread.Message ID: @.**@.>>
Hi Eva, I can not see your files here. Maybe you could just send them to my email: s220311@dtu.dk
Hi Eva, I've just reproduced your error and suddenly realized that you are studying the fish not human or mouse. That means you couldn't use the removeNonConvensionalChr = TRUE
which regards the non-human or mouse chromosomes as non-conventional and remove them. I also tried to set removeNonConvensionalChr = FALSE
and it went very well. Hope this can help you. 😊
Hi, It works, I just had to modify the removeNonConvensionalChr argument.
Thank you so much !!
Eva
De : Chunxu Han @.> Envoyé : samedi 2 mars 2024 14:54 À : kvittingseerup/IsoformSwitchAnalyzeR @.> Cc : BLONDEAU-BIDET Eva @.>; Author @.> Objet : Re: [kvittingseerup/IsoformSwitchAnalyzeR] ImportRData Ensembl GTF annotation vs Matrix 0 Overlap (Issue #228)
Hi Eva, I've just reproduced your error and suddenly realized that you are studying the fish not human or mouse. That means you couldn't use the removeNonConvensionalChr = TRUE which regards the non-human or mouse chromosomes as non-conventional and remove them. I also tried to set removeNonConvensionalChr = FALSE and it went very well. Hope this can help you. 😊
— Reply to this email directly, view it on GitHubhttps://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/228#issuecomment-1974802441, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO562QVKC7LYQFQXH54NUMLYWHKW5AVCNFSM6AAAAABDUSNEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUHAYDENBUGE. You are receiving this because you authored the thread.Message ID: @.**@.>>
Hello, I am encountering an error when importing data into R. I used STAR + RSEM to quantify my data. A genome and an annotation file are available on Ensembl. I don't understand the error because I can find the same genes and isoforms in my counting matrix and in my annotation file. I've checked several times and the fasta and gtf files used for mapping + quantification are indeed the same as those indicated in the command line under R. Attached is a file with a preview of a count matrix for 1 sample, followed by my annotation file. Overview_counts_annotation.txt
Thank you for your help,
Eva
Here is my command line :
aSwitchList_2 <- importRdata( isoformCountMatrix = RSEM_Quant$counts, isoformRepExpression = RSEM_Quant$abundance, designMatrix = myDesign, isoformExonAnnoation = "Dicentrarchus_labrax.dlabrax2021.111.gtf", isoformNtFasta = "Dicentrarchus_labrax.dlabrax2021.dna.toplevel.fa", fixStringTieAnnotationProblem = FALSE, removeNonConvensionalChr = TRUE, ignoreAfterBar = TRUE, ignoreAfterSpace = TRUE, ignoreAfterPeriod = TRUE, showProgress = FALSE )
And here is the error message:
_Erreur dans importRdata(isoformCountMatrix = RSEM_Quant$counts, isoformRepExpression = RSEM_Quant$abundance, : The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). Either isforoms found in the annotation are not quantifed or vise versa. Specifically: 0 isoforms were quantified. 69565 isoforms are annotated. Only 0 overlap. 0 isoforms quantifed had no corresponding annoation
This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.
If there is no overlap (as in zero or close) there are two options: 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files). 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments. Examples from expression matrix are :
Examples of annoation are : ENSDLAT00005029896, ENSDLAT00005056072, ENSDLAT00005077077 Examples of isoforms which were only found im the quantification are :_