In brief, the problem is this: when we refer to an external storage, how do we interpret non-absolute paths?
As I wrote in #3008, "I am slowly coming around to the idea that loading things relative to the manifest path is correct."
So I think the Right Fix would be to rejigger the path to the zip file to be interpreted relative to the RocksDB location.
However, there is another fun component to this, which is that I'm not sure it's documented anywhere that RocksDB indices created by this plugin store the sketches externally, which is needed for gather (but not for manysearch)... Per @luizirber on slack,
me:
do RocksDB disk rev indexes, once created, require access to the original signatures to function?
@luizirber:
Yes for gather, no for search
You can also build the index with internal sigs, but I don’t think it’s the default
In https://github.com/sourmash-bio/sourmash/pull/3250, @luizirber implemented internal storage for RocksDB, and over in #390 that is now being set as the default for new RocksDB indexes; documentation is being updated as well, in a new PR.
However, the original issue of relative path interpretation still remains, so this issue is here to remind us of all that went before.
In https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/381, we found a problem when using RocksDB inverted indexes with zip files.
Starting from https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/390#issuecomment-2225776967,
I found the problem in the
storage::InnerStorage
struct, which was opening the zip file using a relative path:Conveniently 😠this is not a new bug or problem, this is the same problem that I discussed ad nauseum over in https://github.com/sourmash-bio/sourmash/issues/3008, where we found it to be a problem for manifests - triggered initially by https://github.com/sourmash-bio/sourmash/issues/3053.
In brief, the problem is this: when we refer to an external storage, how do we interpret non-absolute paths?
As I wrote in #3008, "I am slowly coming around to the idea that loading things relative to the manifest path is correct."
So I think the Right Fix would be to rejigger the path to the zip file to be interpreted relative to the RocksDB location.
However, there is another fun component to this, which is that I'm not sure it's documented anywhere that RocksDB indices created by this plugin store the sketches externally, which is needed for gather (but not for manysearch)... Per @luizirber on slack,
me:
@luizirber:
In https://github.com/sourmash-bio/sourmash/pull/3250, @luizirber implemented internal storage for RocksDB, and over in #390 that is now being set as the default for new RocksDB indexes; documentation is being updated as well, in a new PR.
However, the original issue of relative path interpretation still remains, so this issue is here to remind us of all that went before.