mbr / simplekv

A simple key-value store for binary data.
http://simplekv.readthedocs.io
MIT License
152 stars 50 forks source link

Add iter_keys_upto_delimiter #93

Closed crepererum closed 5 years ago

crepererum commented 5 years ago

PR is split in the following way:

  1. fixing travis again
  2. implement the generic function
  3. implement fast path for FS and Azure

Closes #83

coveralls commented 5 years ago

Coverage Status

Coverage decreased (-7.7%) to 81.708% when pulling 4b5a3d047fe56045f2fb235a292a31281a4d2645 on crepererum:issue_83 into 00500e2d26528b5d4cfdfc9611651d0cab011164 on mbr:master.

fmarczin commented 5 years ago

@mbr : This seems like a useful addition to me and complements the prefix support in list_keys() and other places quite nicely. I'm asking for your approval, since this PR extends the core API.

mbr commented 5 years ago

I'm not entirely sure this needs to be in the core API (I like to keep it small, as you know ;)). The "primitive" implementation is just iterating over all keys and collecting the prefix. While this is useful, is there a real need to add this to the core API?

There is one condition under which I would add it: If it is common for backend-implementations to offer a more efficient version of this (I think S3 might do this, but they might limit it to a fixed / delimiter). Otherwise, it would not be possible to take advantage of these.

crepererum commented 5 years ago

@mbr I've added fast-paths for Azure (all possible delimiters) and FS (only works if the delimiter is the OS path separator).

For S3, this also works for any delimiter (see boto3 docs and AWS API docs), but I did not implement this since I think we should tackle #84 first and I'm not an S3 user.

So yes, it is rather common for upstream services to support this "directory-like" listing. I did some internal benchmarks for massive stores and you get huge advantages from the fast-path:

mbr commented 5 years ago

Nice. In that case you have my blessings, I shall bikeshed no more =). Probably could not hurt to mention these things somewhere in the docs, i.e. that some backends might have better performance characteristics than others for this.

crepererum commented 5 years ago

done