Open jennybc opened 7 years ago
TBH I am not sure if this fits into storr, because we are essentially using it as a key value DB, and don't do queries on the data.
So the RDS backend is great (and storr
is great in general!), but since we add the storage to the git repo, it would be nice to see which unhashed keys the individual storage files belong to.
I guess this would be another storr
backend, no?
Or maybe just a prefix slug, as @jennybc suggested?
I'm afraid I don't understand the issue well, or what the snail :snail: slug bits actually mean. But from looking quickly at the httrmock code, it's possible that the namespace functionality of storr could do this?
The idea is that one can do this:
st <- storr::storr_environment()
st$set("foo", "value1")
st$get("foo") # value1
but if you have many foo
keys and want to keep them separated then you can use namespaces:
st$set("foo", "value2", namespace = "other")
st$get("foo") # value1
st$get("foo", "other") # value2
The default namespace is called objects
, though this can be tweaked on storr creation with the default_namespace
option. Practically, for the RDS storr, this keeps key/value lookups in different directories, but for other drivers it uses a prefix which is why I am thinking this might be appropriate here.
Let me know if I'm off the mark
Sorry for not being clear! I'll make another attempt.
Basically, my issue is that with the RDS store it is hard to follow which objects are changing in git, because the file names are just hashes.
In gh
, which uses httrmock
we store HTTP requests and responses, so that we can replay them at test time. We store them in an RDS storr
, The key is usually just list(url = url, method = method)
, sometimes more. We put the RDS files in git, and also in the R package, so that the user is able to run the (mock) test cases. They look like this in gh
:
tests/testthat/httrmock/data/0d2b78a225570cb97dda60ccf22d68f1.rds
tests/testthat/httrmock/data/27b277b1524f9894ba81145304f0f866.rds
tests/testthat/httrmock/data/b1d243dd06539ab9e51cafd39249c41e.rds
tests/testthat/httrmock/data/ba9158074556da2ddc5ccf596e31aa3b.rds
...
tests/testthat/httrmock/keys/objects/0ef0ee2f98a95501032c8dfbd9dc1461
tests/testthat/httrmock/keys/objects/14673dc1fde22d49bc845244b8b4bbfd
tests/testthat/httrmock/keys/objects/39ae4decc7dfce0e35feef7c547904d5
...
This is all nice, but when we develop the package and change/update the test cases, it is really hard to say which stored HTTP requests changed, because the filename is just a hash.
So, yes, namespaces are great! We can set the namespace of a request to be its URL, and that will make a nice grouping of the hundreds of gh
tests that we'll have (if we eventually write them). It will also make it much clearer which objects change in git.
Is this all what we needed, @jennybc ?
Hmm. Part of my personal use-case that inspired storr was to get de-duplicated storage and avoid hitting an external source by separating the key <-> value mapping into a key <-> hash <-> value mapping. I can see that's probably causing some pain here.
With the current driver design I can (probably) implement a direct key -> value mapping, which would probably make your use case simpler; let me know if that would be useful and I can try and get it done before the next CRAN release.
Another function you probably want to be aware of is the gc
method. If you call that as your tests finish up then it will look for elements in data
with no key pointing at them and delete them. It will invisibly return a vector of the deleted keys. That will be a good idea to run periodically or the database will grow without bound (this is a side effect of another design decision which was to make it relatively safe for multiple processes to read and write from the db at once)
@richfitz Thanks! Let us try the namespacing and we'll see if it indeed solves our problem.
I am in a suboptimal time zone for this! But, yes, if @gaborcsardi thinks using the existing namespace functionality will work, that will be great. Since I assume httrmock will keep using RDS, then this will mean lots of directories? But that seems fine.
If I follow, we could end up using a new httrmock context and a new storr namespace for each test. In order to get the necessary control over request matching and to get human-readable names, respectively.
@gaborcsardi When I zoom out, it feels like there are two independent key-value systems being set up for the same set of objects, doesn't it? Maybe this is unavoidable, full stop. Or unavoidable if you want to take advantage of all the work @richfitz has already done in storr and not reinvent that particular wheel. Do I understand this right? The only way to simplify is to somehow make the object storage system build its keys based on the HTTP request.
Since I assume httrmock will keep using RDS, then this will mean lots of directories?
Yes.
If I follow, we could end up using a new httrmock context and a new storr namespace for each test. In order to get the necessary control over request matching and to get human-readable names, respectively.
I guess httrmock
can just use the storr
namespaces automatically, no human interaction is needed there. (I mean, I'll have to implement this.)
And we only need httrmock
namespaces if HTTP method + URL do not determine the request+response completely. Also, I think I'll add the POST data and the query arguments to the key, and then explicit httrmock
contexts will be rarely needed.
@gaborcsardi When I zoom out, it feels like there are two independent key-value systems being set up for the same set of objects, doesn't it? Maybe this is unavoidable, full stop. Or unavoidable if you want to take advantage of all the work @richfitz has already done in storr and not reinvent that particular wheel. Do I understand this right? The only way to simplify is to somehow make the object storage system build its keys based on the HTTP request.
Good points. Yes, I guess we could just give the (sanitized) request to storr
and it would take care of everything, right?
Sorry, this is all a bit experimental...OTH I am excited to create sg that we use to test HTTP APIs properly.
Sorry, this is all a bit experimental...OTH I am excited to create sg that we use to test HTTP APIs properly.
No need to apologize! It is pleasing to work through this. I think gh can be a pilot study for how this might work for other APIs.
Just to make this discussion more concrete, here's my current git status in gh:
I have no clue which endpoints or tests or whatever all of this corresponds to. And it's not particularly easy to find out and decide to delete or stage and commit. This is the problem we're trying to solve.
Well, to be fair there is httrmock::list_recordings()
..... but yes, we'll use namespaces....
httrmock::list_recordings()
is my dear friend, yes 😀.
It might be easiest to use one opaque object, perhaps? Something like the DBI/SQLite storage that the development version of storr has (check out the develop
branch, use storr::storr_dbi()
with a RSQLIte
connection object). Then from the point of view of the end user there is only a single object, though it will change pretty much every run there is a write.
it will change pretty much every run there is a write.
That's the thing, this is not ideal, you don't have a clue what is changing.
Namespaces are an improvement, but still not the real thing: https://github.com/gaborcsardi/httrmock/issues/4#issuecomment-266875686
Thinking about this some more; if the major issue is getting things stored nicely in git, then it would be possible to use git itself as the object store (see #1) but this is not implemented because I thought it was a bit insane
I thought about a git backend as well, but a generic git backend for storr would look probably quite different than what we need here. A generic backend would probably use a separate git repo, and do commit
automatically, but we don't want these.
I was thinking of exploiting the git object store directly. But I suspect that what you need is simpler than this.
If it's useful I can get the direct key -> value mapping bits working; it should not take too long. That would basically do:
db$set(key, value, namespace) => saveRDS(value, file.path(root, namespace, key))
db$get(key, namespace) => readRDS(file.path(root, namespace, key))
db$list(namespace) => dir(file.path(root, namespace))
Or if you can implement your own in a very few lines of code
@gaborcsardi; can you check out the direct branch of storr, and see if that helps you here. To use, run
storr::storr_rds(path, hash_algorithm = "NOHASH")
It will appear to work the same way as the storr object you have been using but drops the key -> hash -> value separation (just doing key -> value). Namespaces are still supported, as are all the methods you have been using.
This is experimental (as in, I wrote it on the train on the way in and have not fully tested it) but if it looks useful I can tidy it up.
@richfitz Thanks!
This is a big step towards our use case, but it is still not there unfortunately, because we often a different key and filename. This is simply because the key might contain a bunch of HTTP headers, or even the (hash of) the body of an HTTP POST request, but we obviously don't want these to show up in the filename. Sorry.
I still not sure I follow all the constraints of your problem, but if you think that it'd be a generally useful thing I'd welcome a PR (or more details). It sounds like you need a level of indirection there, but I'm not sure I can guess how best to balance the set of competing requirements 😄
OK, sorry, I don't expect you to squeeze our use case in storr. :) I might create a PR, if it seems fitting, though.
Nothing to apologise for - I just can't see my way through to the problems that you can see. There are heaps of trade-offs in designing these things, and at present storr models my needs closely. To be generally useful I'm interested in the problems that I can't guess at
Related to a conversation in httrmock, which uses storrr to store recorded HTTP requests (https://github.com/gaborcsardi/httrmock/issues/3).
Have you ever contemplated a naming scheme that would allow the incorporation of an optional slug? Perhaps as a prefix.
cc @gaborcsardi