openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

Database path should not default to gtf path #219

Open edurand opened 5 years ago

edurand commented 5 years ago

Hello,

Thanks for the great work on pyensembl - very useful. My case might not reflect the general usecase, but I'd like to hear your thoughts about this.

I am creating a Genome instance from local genome files (gtf, fasta, ...). Importantly, I do not have write access to where the genome files are stored.

Then, when I try to index the new genome object, pyensembl attempts to create a database in the same directory as the gtf file, which causes a permission error.

My current work-around

data = Genome(reference_name=build,
                  annotation_name=name,
                  annotation_version=version,
                  gtf_path_or_url=gtf,
                  transcript_fasta_paths_or_urls=fasta)
# Without this I cannot index
data._db.cache_directory_path = data.download_cache.cache_directory_path
data.index()

I know that I can specify cache_directory_path when building the Genome() object, but then I have to re-implement the nice directory structure logic already implemented in DownloadCache, which I'd like to avoid. Wouldn't it be preferable to have data._db.cache_directory_path default to data.download_cache.cache_directory_path?

Thanks!