Closed JudithR closed 3 years ago
hi Judith,
Thanks for the report.
What happens after importing the GTF is that it will try to cache this database in your specified BiocFileCache location.
It may be worth testing if you can do these steps manually.
bfc <- BiocFileCache::BiocFileCache()
txdb <- GenomicFeatures::makeTxDbFromGFF(...)
loc <- BiocFileCache::bfcnew(bfc, rname="testing", ext=".sqlite")
saveDb(txdb, file=loc)
Thx for pointing me in the right direction. At least no I know which part fails. Assuming you meant GenomicFeatures. I managed to create the database, but it won't write it to the file. However changing the cache location to /tmp works
txdb <- makeTxDbFromGFF(file=gtfPath, dataSource="EnsemblDbv97", organism="Parus major", chrominfo=chromInd)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
> saveDb(txdb, file=loc)
Error: Failed to copy all data:
not an error
In addition: Warning message:
Couldn't set synchronous mode: database is locked
Use `synchronous` = NULL to turn off this warning.
With tmp dir
bfc <- BiocFileCache::BiocFileCache("/tmp/judithr/pmaj_index_cache")
using temporary cache /tmp/Rtmp2O0DyA/BiocFileCache
> loc <- BiocFileCache::bfcnew(bfc, rname="testing", ext=".sqlite")
> saveDb(txdb, file=loc)
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: EnsemblDbv97
# Organism: Parus major
# Taxonomy ID: 9157
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 33036
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-04-12 15:14:08 +0200 (Mon, 12 Apr 2021)
# GenomicFeatures version at creation time: 1.42.3
# RSQLite version at creation time: 2.2.6
# DBSCHEMAVERSION: 1.2
Have you encountered something like this before? I think the new file system is to blame, and not so much Bioconductor and tximeta.
This is useful, I think a possibility is that R doesn't have permissions to write to the new BiocFileCache location, or something is problematic in that step.
Maybe then try out simple test:
bfc <- BiocFileCache::BiocFileCache()
loc <- BiocFileCache::bfcnew(bfc, rname="testing2")
x <- 1:10
save(x, file=loc)
And then you can figure out why the directory permissions are an issue.
That works fine. So does writing the first sqlite entry, just not the second. After I checked out the devel branch also tximeta returns the cursor with the same error message as the separate steps. This are the contents of the .cache folder
drwxr-xr-x 3 me domain users 3 Apr 12 15:49 ..
-rw-r--r-- 1 me domain users 347 Apr 12 15:49 9ea1c4955a5_9ea1c4955a5.rds
-rw-r--r-- 1 me domain users 20K Apr 12 15:50 BiocFileCache.sqlite
-rw-r--r-- 1 me domain users 0 Apr 12 15:50 9ea1c6ea8b017_9ea1c6ea8b017.sqlite
drwxr-xr-x 2 me domain users 5 Apr 12 15:50 .
So it's not that it can't write, it just can't write this one to any location on the file system. Is it possible that the database handle is not released after touching the file? But why it would then work on /tmp is beyond me.
I think now that you've narrowed it to a simple test example, I'd recommend to post to support.bioconductor.org with BiocFileCache tag, to see if we can get advice from the developers.
So just the case where you try to saveDb to the BFC location and you get the "Failed to copy all data" error.
Thx for all the help. I posted it, referring to this thread. When copying stuff across I did see one warning when using tximeta with /tmp that might be relevan (the db disconnect), but not sure where it is generated.
> se_s1 <- tximeta(samples_s1, useHub=FALSE)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
found matching linked transcriptome:
[ Ensembl - Parus major - release 97 ]
building EnsDb with 'ensembldb' package
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
Attribute availability:
o gene_id ... OK
o gene_name ... OK
o entrezid ... Nope
o gene_biotype ... OK
OK
Processing transcripts ...
Attribute availability:
o transcript_id ... OK
o gene_id ... OK
o transcript_biotype ... OK
OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... OK
-------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
generating transcript ranges
Warning messages:
1: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-97/mysql/)
2: In checkAssays2Txps(assays, txps) :
Warning: the annotation is missing some transcripts that were quantified.
1133 out of 27493 txps were missing from GTF/GFF but were in the indexed FASTA.
(This occurs sometimes with Ensembl txps on haplotype chromosomes.)
In order to build a ranged SummarizedExperiment, these txps were removed.
To keep these txps, and to skip adding ranges, use skipMeta=TRUE
Example missing txps: [ENSPMJT00000000007, ENSPMJT00000000011, ENSPMJT00000000017, ...]
3: call dbDisconnect() when finished working with a connection
The warning about annotation missing some txps, and the warning about dbDisconnect are both ok (I get those on my end as well).
Hi Mike,
Thx for all the help. The IT department figured it out. The firewall of the new storage server caused issues with writing larger blocks to disk.
Thanks for the reports!
Hi, I am not sure whether tximeta is to blame, but I have an issue I can't resolve. I hope you might have an idea and can point me in the right direction.
I'm running R 4.01 with tximeta 1.8.4 (for the rest see attached sessionInfo) I'm using the Parus major transcriptome from Ensembl 97. I had this as a cached version for quite some time and it worked without issue. Then our server got updated and I had to rebuild everything as file paths changed. Now I can't seem to build the transcriptome. It just hangs. I use the makeLinkedTxome option as the Parus major genome is not hashed and not a member of Ensembl anymore.
I checked the files and locations and they exist, the files are also readable. I rebuild the salmon index (salmon 1.1.0) to be sure it had not been corrupted.
It makes the bfc:
I know the old one (that stopped working) had 3 entries, not one.
The pmaj_index.json looks good:
The rds file in \~/.cache/BiocFileCache/ shows the gtf file.
and when I try to run tximeta it starts fine. It recognizes the transcriptome and then
tries to build the complete cache?
Without hub:
And there it stops, forever and unrecoverable (I have to terminate R using kill)
I used strace to trace the process and I can see it reads the counts files and the gtf file, but nothing else. This is the last thing it did:
Am I missing something? I removed the pmaj_index.json, the .cache/tximeta/bfcloc.json and everything in .cache/BiocFileCache prior to the rerun. I also followed the instructions in the vignette about removing a linked Txome.
session_info.txt