Closed karlaarz closed 2 years ago
Can you post the starting few lines of a quant.sf file and also a few lines from the Ensembl GTF. Sometimes Ensembl has a slightly different naming scheme between GTF and FASTA.
Also in the meantime you can use skipMeta=TRUE if you don’t need the genomic ranges right now.
Hi Mike,
Sure:
head(quant)
Name Length EffectiveLength TPM NumReads
1 ENSDART00000189431.1 11 1 0.000000 0
2 ENSDART00000189226.1 10 1 0.000000 0
3 ENSDART00000172037.2 344 94 0.000000 0
4 ENSDART00000165410.2 350 100 0.462361 1
5 ENSDART00000163675.2 339 89 0.000000 0
6 ENSDART00000172374.2 355 105 0.000000 0
head(gtf)
#!genome-build GRCz11
#!genome-version GRCz11
#!genome-date 2017-05
#!genome-build-accession GCA_000002035.4
#!genebuild-last-updated 2018-04
4 havana gene 30402837 30403763 . + . gene_id "ENSDARG00000103202"; gene_version "2"; gene_name "CR383668.1"; gene_source "havana"; gene_biotype "lincRNA";
4 havana transcript 30402837 30403763 . + . gene_id "ENSDARG00000103202"; gene_version "2"; transcript_id "ENSDART00000159919"; transcript_version "2"; gene_name "CR383668.1"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "CR383668.1-201"; transcript_source "havana"; transcript_biotype "lincRNA";
4 havana exon 30402837 30402893 . + . gene_id "ENSDARG00000103202"; gene_version "2"; transcript_id "ENSDART00000159919"; transcript_version "2"; exon_number "1"; gene_name "CR383668.1"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "CR383668.1-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSDARE00001204173"; exon_version "1";
4 havana exon 30403203 30403350 . + . gene_id "ENSDARG00000103202"; gene_version "2"; transcript_id "ENSDART00000159919"; transcript_version "2"; exon_number "2"; gene_name "CR383668.1"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "CR383668.1-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSDARE00001194706"; exon_version "1";
4 havana exon 30403546 30403763 . + . gene_id "ENSDARG00000103202"; gene_version "2"; transcript_id "ENSDART00000159919"; transcript_version "2"; exon_number "3"; gene_name "CR383668.1"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "CR383668.1-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSDARE00001199782"; exon_version "1"
head(fasta)
>ENSDART00000189431.1 cdna chromosome:GRCz11:2:36087769:36087779:1 gene:ENSDARG00000116509.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:BX681417.25
GATTGGGGTAC
>ENSDART00000189226.1 cdna chromosome:GRCz11:2:36088047:36088056:1 gene:ENSDARG00000116470.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:BX681417.24
TCTGGACTAC
>ENSDART00000172037.2 cdna chromosome:GRCz11:2:31866722:31867190:-1 gene:ENSDARG00000101672.2 gene_biotype:TR_V_gene transcript_biotype:TR_V_gene gene_symbol:trgv7 description:T cell receptor gamma variable 7 [Source:ZFIN;Acc:ZDB-GENE-051115-9]
ATGAGCCTTCAAATGATCTTGTTTTTCTTTCTTTTATATAGAGTTGATGGACAAGCGATG
CTGCGACAGAAAATATCCTCAACCAAATCTCAGGACAAGACTGTTGTCATAGACTGTGAT
TACCCTTCAGACTGTTATAGGTACATCCACTGGTACCAACTAAAAGGACAAACCTTAAAG
AGAATATTATATGCACAAATTTCAGGAGGAGAACCAGCCAGAGATGCTGGTTTTGAATTG
TTTAAAATAGACCGTAAACAGTCAAATATTGCTCTGAAAATACCTGAACTGAAAACAGAG
I see that there is a difference between the transcript names. The GTF doesn't have the transcript version that the FASTA and Salmon's output do.
If I add the skipMeta=TRUE
it works.
I think if you specify ignoreTxVersion = TRUE
it may also be able to connect the FASTA to GTF.
tximeta/tximport don't do any guessing of the matches (just because there are so many different sources, and we don't want to make a mistake by assuming any bit of an identifier is insignificant). But we do have some options to help deal with inconsitencies in the source files.
Hi Mike, yes, by adding ignoreTxVersion = TRUE
and skipMeta=TRUE
options now it runs smoothly. Thanks for the help!
Hello,
I am trying to import some Salmon data using tximeta but I get the following error:
My script goes as following:
I am using tximeta v1.14.0 and the R version 4.2.0 (2022-04-22).
Any help would be appreciated.
Thanks