Open 0xaf1f opened 7 years ago
I've just referenced this issue from https://hpc.nih.gov/apps/agfusion.html#notes
Hey @0xaf1f -- PyEnsembl used to download GTFs and FASTA files, create some intermediate CSV files and then write out "indexed" forms of the genomic metadata (as a .db
file) and sequences (as a .pickle
file). I've gotten rid of the intermediate step (CSV files) but the indexed databases still get created -- it wouldn't be possible to do efficient lookups without them. Is there an alternative that you would prefer?
It's been a while, so I don't remember all the issues here. I'm not opposed to the existence of a cache. I was just asking for a separation of persistent and transient files. I'd have to go back and refresh my memory to see what was going on here.
I'm trying to set up pyensembl on a shared system, so I set
PYENSEMBL_CACHE_DIR
to a central location before runningpyensembl install
for various datasets. The problem is when I run the program with lower privileges, I see it's also trying to write some (temporary?) files there. While expecting it to be writable makes sense for caching, I think the immutable data should be treated differently and allowed to be placed in a read-only location.Thanks for your consideration