ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
109 stars 40 forks source link

Working with dowloaded tables offline #250

Closed juanmayorgahenao closed 1 year ago

juanmayorgahenao commented 2 years ago
Session Info ```r ```

Hi - I have a brief clarifying question: After using fb_import, are dowloaded tables available for use in fresh R sessions without internet connection?

Thank you

cboettig commented 2 years ago

yes

juanmayorgahenao commented 2 years ago

Thanks @cboettig. I'm struggling to make dowloaded tables available offline.

For example, after running rfishbase::fb_import(tables = "estimate"), I get this message <duckdb_connection 15820 driver=<duckdb_driver c9030 dbdir=':memory:' read_only=FALSE>>. The dbdir=':memory:' part makes me suspect the data is only store in memory. I could be wrong of course.

If I then disconnect from the internet, and run rfishbase::estimate("Sphyrna mokarran") I get the error message:

Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match In addition: Warning messages: 1: Error in curl::curl_fetch_memory(file, handle): Could not resolve host: hash-archive.org

2: Error in curl::curl_fetch_memory(file, handle): Could not resolve host: hash-archive.carlboettiger.info

3: Error in curl::curl_fetch_memory(file, handle): Could not resolve host: hash-archive.org

4: Error in curl::curl_fetch_memory(file, handle): Could not resolve host: hash-archive.carlboettiger.info

5: In curl::curl_fetch_memory(url, handle = handle) : Could not resolve host: archive.softwareheritage.org 6: In curl::curl_fetch_memory(url, handle = handle) : Could not resolve host: cn.dataone.org

I also inspected the output of rfishbase::db_dir() and it seems to be in the right place Library/Application Support/org.R-project.R/R/rfishbase.

Any thoughts on where the issue might be?

Thank you

cboettig commented 2 years ago

Thanks for the report.

The first part,

<duckdb_connection 15820 driver=<duckdb_driver c9030 dbdir=':memory:' read_only=FALSE>>

is expected. rfishbase uses duckdb only to read the static parquet files that fb_import has downloaded locally to your computer. duckdb should have no need for an additional on-disk database storage, because it can use parquet as the on-disk database.

However, after you disconnect from the internet, it seems that for some reason it fails to find the local copy, and so goes looking for the copy on the internet. We'll have to debug why that is failing for you. Can you first make sure you have the most recent version of contentid and rfishbase?

fb_import uses the Provenance log to determine the identifier for a particular table. e.g. here is the entry for the most recent version of the estimate table: https://github.com/ropensci/rfishbase/blob/6ebd80e92a93366ce6b159eadd94c5e47f06d31e/inst/prov/fb.prov#L653-L660

See if you can resolve that id directly offline as well as online:

path <- contentid::resolve("hash://sha256/7f258428dadc8031f5e8111ab088d4f3b00130b1985b318153a98d2f7cdf2b66", store=TRUE, dir = rfisbase::db_dir() )
path

If this succeeds, you should get a path back that points to rfishbase::db_dir(), with the file named by it's sha hash (and no file extension). If it is failing offline, then path will be NA. Can you try that and let me know?

Thanks for the help and sorry for the trouble.

juanmayorgahenao commented 2 years ago

Thanks @cboettig. This succeeds online with output:

"/Users/marinedatascience/Library/Application Support/org.R-project.R/R/rfishbase/sha256/7f/25/7f258428dadc8031f5e8111ab088d4f3b00130b1985b318153a98d2f7cdf2b66"

but fails offline with error message:

`Warning: Error in curl::curl_fetch_memory(file, handle): LibreSSL SSL_read: error:02FFF03C:system library:func(4095):Operation timed out, errno 60

Warning: Error in curl::curl_fetch_memory(file, handle): LibreSSL SSL_read: error:02FFF03C:system library:func(4095):Operation timed out, errno 60

Warning: Error in curl::curl_fetch_memory(file, handle): Could not resolve host: hash-archive.org

Warning: Error in curl::curl_fetch_memory(file, handle): Could not resolve host: hash-archive.carlboettiger.info

Warning in curl::curl_fetch_memory(url, handle = handle) : Could not resolve host: archive.softwareheritage.org Warning in curl::curl_fetch_memory(url, handle = handle) : Could not resolve host: cn.dataone.org Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match`

Thank you for your help!

juanmayorgahenao commented 2 years ago

I'm working with the a fresh github install of both packages

cboettig commented 2 years ago

Thanks, weird, I can't reproduce still, but at least we have this down to a more minimal example. Can you give me the full output of sessionInfo() after producing this error?

(The warnings are kind-of expected, in that you expect a curl timeout for remote sources, but contentid should not even try the remote sources since it should find a local copy first. This suggests it is not finding the local copy, even though it's showing up in your local directory.....

cboettig commented 1 year ago

This should be a bit more stable with the latest release, which simplifies some of the logic. See updated notes in README. apologies for the trouble earlier!