ropensci / taxizedb

Tools for Working with Taxonomic SQL Databases
Other
30 stars 7 forks source link

Allow the use of a custom datafiles in db_load_* functions #72

Open arendsee opened 7 months ago

arendsee commented 7 months ago

Description

This commit adds a "path" keyword argument to the db_download_* files that allows the user to bypass the taxonomy database download step and use a custom local file.

Related Issue

Fixes #71, or at least provides a workaround.

Example

# This command will
#  1. download the NCBI taxonomy dump as "taxdmp.zip" using curl and a hard-coded URL
#  2. unzip the file and build the sqlite database
db_download_ncbi()

# This command bypasses the first download step. This allows
# the user to retrieve the data from a different source, modify it,
# or use a different tool to retrieve it (e.g., wget, rsync, or whatever).
db_download_ncbi(path="taxdmp.zip")
arendsee commented 7 months ago

The issue reported in #71 about curl timing out is happening in the tests here. It appears to be unrelated to my new code.

On my personal Linux machine this works:

> db_url <- 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip'
> db_path_file = "z.zip"
> curl::curl_download(db_url, db_path_file, quiet = TRUE)

This also works:

> taxizedb::db_download_ncbi()

Using libcurl 8.5.0, OpenSSL/3.2.0, R 4.3.2, and curl_5.2.0

On my server at work I encountered the same time out problem as the check here and @bergalu in #71.