Open mikelove opened 4 years ago
Hi Mike
Updated both tximeta and bio-conductor however still got the message
importing quantifications reading in files with read_tsv 1 2 3 4 5 6 7 8 couldn't find matching transcriptome, returning non-ranged SummarizedExperiment
using gencode.v35 and run salmon on alignment-based mode
I think my issue is that I am using the alignment-based mode (which I require) in salmon and this does not have the "index":"/path/to/genecode.v35_salmon_1.3.0"
metadata incmd_info.json
file of counts, as this mode does not require the index. I guess that this is where tximeta gets the infomration from.
for aligned-based method salmon produces a "target": "path/to/ReferenceGenome/gencode.v35.transcripts.fa"
maybe this can be used added as a fix later updates?
I'm not entirely sure
Meanwhile in a addition to the above i did created salmon index for gencore.v35 and got the hash strings from the info.json
and added them to the meta_info.json
to each of the counts and tximeta worked.
Maybe not the best was way to go about but it worked
This is a perfectly valid solution.
I’m not sure if the target file would have the same hash as the transcripts from the source. Can you check for your example? For now our hash is sensitive to sequence order for example.
How would you go about getting the checksum of the source file. The file I used on - t
on salmon quant was downloaded from GENCODE ftp, however this does not input the hashs onto meta_info.json
. Does salmon need to index to retrieve the hash or can it do it form the target file ?
Kind Regards
You could run salmon index
on the file and then look into the directory that is created to find the hash.
The lightweight version is to run compute_fasta_digest
which can be installed with pip. This is what I use to compute reference transcriptome hashes.
Apologies in delay in replying yes running salmon index
on the same reference does indeed have the same hash. and so does the fasta digest . Just wondering in salmon alignment-based how could this be automated
Mustafa
Let me ask @rob-p, is it possible to have an option to index the target file as part of quant? Indexing is fairly fast, and then reads quantified with the alignment mode would also benefit from tximeta magic.
Note to users:
tximeta is updated with the Bioconductor release cycle (every 6 months). Older, non-release versions of tximeta cannot be modified (there are frozen once the release/devel cycle moves on). So if on your machine you are using a version of tximeta that is not the release version (or the devel version, hosted here), it will not be able to recognize the latest releases from various sources (GENCODE, Ensembl, etc.). You should update Bioconductor and tximeta to the latest version in order to match with the latest releases in this table:
Just a quick note, looks like the direct link to the Pre-computed checksums table isn't working because the P in pre-computed isn't capitalized. I think the corrected link should be:
Feel free to delete this comment after editing the original post and thanks for all the work on tximeta.
Thanks @mbergins
Hello,
I am running into a similar issue with tximeta I think my issue is that i used v38 of gencode for the transcriptome and its not part of the pre-computed checksums mentioned in the table. I am wondering if there is a workaround this until tximeta is updated?
Thank you, Ermela
What is your
packageVersion("tximeta")
Your first place to check is the NEWS file.
If you have a version less than when it was added, then your local version of the package won't autodetect:
https://github.com/mikelove/tximeta/blob/master/NEWS#L19-L21
My bad! I have an older version
package.version("tximeta") [1] "1.8.5"
Thank you so much for your help! I will make sure to check the news next time.
Hi Mike,
After generating quantification data and attempting to run tximeta, I keep encountering this error.
did not find matching TxDb via 'AnnotationHub' building TxDb with 'GenomicFeatures' package Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz' Content type 'unknown' length 46556621 bytes (44.4 MB)
Error in download.file(resource(con), destfile) : cannot open URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz' In addition: Warning messages: 1: In download.file(resource(con), destfile) : downloaded length 11213312 != reported length 46556621 2: In download.file(resource(con), destfile) : URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz': Timeout of 60 seconds was reached
I'm wondering why there is no matching TxDb found, or why the download will not proceed after multiple attempts.
Thanks.
URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz': Timeout of 60 seconds was reached
This means that your connection was too slow to download this resource from EBI. Is it possible for you to try again on a faster connection?
I'm getting the same error - "couldn't find matching transcriptome, returning non-ranged SummarizedExperiment" Output of packageVersion("tximeta") = '1.10.0' I'm using source="Ensembl", organism="Drosophila melanogaster", release="104", genome="BDGP6.32" I tried the makeLinkedTxome discussed in the tximeta vignette with no luck I am quantifying in alignment-based mode (used minimap2) and see that MustafaElshani figured out a work around but I don't understant what he did and can't implement. I did create a salmon index and tried to supply it to makeLinkedTxome, but still getting the same error.
Hi @pkerrwall
If you see my comment from 9/4/2020:
https://github.com/mikelove/tximeta/issues/38#issuecomment-687100040
...I don't think you have a seqhash in the quantification metadata files in alignment mode (there is no transcriptome to index, right?). Hence there is nothing to match on. I had a proposal, we can bring this up with @rob-p and see his thoughts. If one wanted to have the transcriptome hash be included in quant with alignment mode, one option would be to point salmon quant
to a Salmon indexed txome, not used for quant but only for the metadata. Curious everyone's thoughts.
Thanks Mike for the quick reply. I'm not sure I understand your reply about the seqhash in the quantification metadata files? I guess we will wait for @rob-p thoughts on this error
To give a little background - I'm running into this error trying to run swish from the fishpond package (just following the swish vignette). I ran into an issue trying to run drimseq with the flybase gtf and ended up just providing my own gene to transcript mapping file and got it to work. Is there a way to manually provide the gene to transcript mapping file for swish? Is this the only metadata that swish needs?
I also tried the following (from the tximeta vignette):
-# to load from local source indexDir <- file.path('/home/shared/db/Dmel/ensembl/Drosophila_melanogaster.BDGP6.32.cdna.all.fa_salmon_index') # still generated a salmon index even though I'm not using this because I use alignment-based mode fastaPath <- file.path('/home/shared/db/Dmel/ensembl/Drosophila_melanogaster.BDGP6.32.cdna.all.fa') gtfPath <- file.path('/home/shared/db/Dmel/ensembl/Drosophila_melanogaster.BDGP6.32.104.gtf') suppressPackageStartupMessages(library(tximeta)) makeLinkedTxome(indexDir=indexDir, source="Ensembl", organism="Drosophila melanogaster", release="104", genome="BDGP6.32", fasta=fastaPath, gtf=gtfPath, write=FALSE)
-# Read in quants with tximeta library(tximeta) -# both the following lines generate error "couldn't find matching transcriptome, returning non-ranged SummarizedExperiment" -#se <- tximeta(coldata) # not working se <- tximeta(coldata, dropInfReps=TRUE, useHub=FALSE) # not working
Oh, if you just want to combine transcripts to gene, I believe you can do:
se <- tximeta(coldata, skipMeta=TRUE, txOut=FALSE, tx2gene=tx2gene)
You need the inferential replicates to run swish, so don't use dropInfReps
.
Let me know how this goes. If it works (and I think it should) I should add this to the tximeta/swish vignettes.
That worked!
I'm now getting an error at y <- scaleInfReps(y) Error in infRepError(infRepIdx) : there are no inferential replicates in the assays of 'y'
This is the same error as https://github.com/mikelove/tximeta/issues/35#issuecomment-640700979 Should I create an issue at https://github.com/mikelove/fishpond/issues for this issue? Before I do, I will read the vignette a little more closely regarding the inferential replicates as you directed the other person in that thread.
This means that you need to have run Salmon with Gibbs samples or bootstraps (a requirement for Swish).
For future questions feel free to post to Bioc support site and tag eg tximeta or fishpond (whichever is relevant or both).
I’ll add these details to the vignettes.
Thanks for your help with this. Good to know about the gibbs sampling & boostrap options for salmon and that they are a requirement for swish. You might want to add some simple salmon examples at the beginning of the swish vignette that show how to do this. Thanks for all your hard work in this area!
We do have at the beginning, “Importantly, --numGibbsSamples 20 was used to generate 20 inferential replicates with Salmon’s Gibbs sampling procedure. Inferential replicates, either from Gibbs sampling or bootstrapping of reads, are required for the swish method shown below.”
but maybe this needs to be in Quick Start
Thanks for pointing that out - I guess I never made it that far down the page :) Yeah - having that message at the beginning of the quick start would be helpful for idiots like me :)
Thanks for the feedback @pkerrwall I've updated both vignettes to provide more information as discussed above.
Note to users:
tximeta is updated with the Bioconductor release cycle (every 6 months). Older, non-release versions of tximeta cannot be modified (there are frozen once the release/devel cycle moves on). So if on your machine you are using a version of tximeta that is not the release version (or the devel version, hosted here), it will not be able to recognize the latest releases from various sources (GENCODE, Ensembl, etc.).
First, check your version:
packageVersion("tximeta")
Then, you can examine in what version specific txomes are added here:
https://github.com/mikelove/tximeta/blob/master/NEWS
If you have a discrepancy, you should update Bioconductor and tximeta to the latest version in order to match with the latest releases in this table:
Checking the NEWS: https://github.com/mikelove/tximeta/blob/master/NEWS
was just "the key" for me. I realized I had the Bioconductor release
version 1.14 or so, I needed the newest 1.15
which has the mouse M30. Ahhh - mazing. Thank you Dr. Love!
EDIT1: just tried with devel
, didn't work, but realized, that I may have downloaded the gtf
and fa
from different sources (ENSEMBL and GENCODE)... So I went through the tximeta code, and found these lines:
hashfile <- file.path(system.file("extdata",package="tximeta"),"hashtable.csv") hashtable <- read.csv(hashfile,stringsAsFactors=FALSE)
Plugged them into RStudio to and found M30
to see what the hashes were, and the ones in the hashtable
above did not match with me. It was sweet that the gtf
and fa
links from the ebi
are included. Going to re-run the pipeline with those two files from ebi
!
New to fishpond, getting the same message for a slightly different reason and looking for help here! I feel the issue rises from file transfer. The .json files no longer have accurate paths to everything.
R version: 4.2.0 package.version("tximeta") [1] "1.14.1" package.version("fishpond") [1] "2.2.0"
What I did:
Question: can I somehow let tximeta know the index folder that was used for salmon quantification? Or is there more to change for tximeta to work? (I've tried to update the "index":___ inside cmd_info.json, but that does not seem to help.)
Lots lots of thanks.
-------------- in case needed, here is files and commands I used to generate Salmon index ----- From flybase, genome file and transcriptome file from release 6.23 http://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.23_FB2018_04/fasta/
grep "^>" <(gunzip -c dmel-all-chromosome-r6.23.fasta.gz) | cut -d " " -f 1 > Dmel_decoys.txt
sed -i.bak -e 's/>//g' Dmel_decoys.txt
cat dmel-all-transcript-r6.23.fasta.gz dmel-all-chromosome-r6.23.fasta.gz > Dmel_gentrome.fa.gz
salmon-1.9.0/bin/salmon index -t
Tximeta will "work" for your case that it will generate an un-ranged SummarizedExperiment. As you are not using GENCODE, Ensembl or RefSeq, it won't automatically download the matching transcriptome metadata. This is all you need to continue analysis with fishpond, etc.
However, if you want to have tximeta populate genomic ranges on the SummarizedExperiment, we have developed tools for this. You obviously need to provide the ranges of the transcripts / genes, which would involve having a custom GTF file.
If you have this already, you can follow the linkedTxome
instructions in the vignette to have tximeta populate the ranges, but again it's not necessary for using fishpond to have ranges metadata on the SummarizedExperiment.
Perhaps if you have follow-up questions you could post here:
Note to users:
tximeta is updated with the Bioconductor release cycle (every 6 months). Older, non-release versions of tximeta cannot be modified (there are frozen once the release/devel cycle moves on). So if on your machine you are using a version of tximeta that is not the release version (or the devel version, hosted here), it will not be able to recognize the latest releases from various sources (GENCODE, Ensembl, etc.).
First, check your version:
Then, you can examine in what version specific txomes are added here:
https://github.com/mikelove/tximeta/blob/master/NEWS
If you have a discrepancy, you should update Bioconductor and tximeta to the latest version in order to match with the latest releases in this table:
https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Pre-computed_checksums