vedang / clj_fdb

A thin Clojure wrapper for the Java API for FoundationDB.
https://vedang.github.io/clj_fdb/
Eclipse Public License 1.0
25 stars 9 forks source link

Using `get-range` with raw prefix keys #29

Open FiV0 opened 1 year ago

FiV0 commented 1 year ago

Hey, thanks for the library.

I am trying to use it with raw byte buffers (well for reasons...). The following piece of code illustrates the issue

(ns test 
  (:require [me.vedang.clj-fdb.FDB :as cfdb]
            [me.vedang.clj-fdb.core :as fc]
            [me.vedang.clj-fdb.impl :as fimpl]
            [me.vedang.clj-fdb.subspace.subspace :as fsub]
            [me.vedang.clj-fdb.range :as frange]
            [taoensso.nippy :as nippy]))

(def fdb (cfdb/select-api-version cfdb/clj-fdb-api-version))
(def db (cfdb/open fdb))

(defn ->buffer [v] (nippy/freeze v))
(defn ->value [b] (nippy/thaw b))

(def subspace (fsub/create ["the-store"]))

;; works fine
(fc/set db subspace (->buffer "foo") nil)
(fc/get db subspace (->buffer "foo"))
;; => []
(fc/get db subspace (->buffer "not present"))
;; => nil

;; put some more data in
(->> (for [i (range 10)]
       (str "foo" i))
     (map ->buffer)
     (map #(fc/set db subspace % nil)))

;; with only subspace works fine
(-> (fc/get-range db subspace) 
    (update-keys (fn [v] (map ->value v))))
;; => {("foo8") [],
;;      ...
;;      ...
;;     ("foo4") []}

(def ^:private empty-byte-array (byte-array 0))
(fc/get-range db subspace empty-byte-array)
;; expects Tuple
(fc/get-range db subspace (->buffer "foo"))
;; expects Tuple

;; this gives me only the ["the-store" "foo"] prefix  
(def prefix (->buffer "foo"))
(-> (fc/get-range db (frange/starts-with (fimpl/encode subspace prefix)))
    (update-keys (fn [v] (update v 1 ->value))))
;; => {["the-store" "foo"] []}

;; but I would like something of the sort
(fc/get-range db subspace (->buffer "foo"))
;; => {["foo"] [] , ["foo0"] [], ["foo1"] [] ....}

So essentially I am struggling how to query for all values that satisfy a certain raw prefix in a subspace. Is this possible with your wrapper? I tried all kinds of variations for get-range but it never returned anything useful.

(defn- ->byte-array [ba1 ba2]
  (byte-array (mapcat seq [ba1 ba2])))

(fc/get-range db (fsub/create (->byte-array (fsub/pack subspace) prefix-k)))
(fc/get-range db (frange/starts-with (->byte-array (fsub/pack subspace) prefix-k)))
(fc/get-range db (frange/starts-with (fimpl/encode subspace prefix-k)))
FiV0 commented 1 year ago

I read up some more on fdb and the thing I was missing were KeySelector's. Your get-range currently does not support those. I will try to add those and also look into adding the limit option mentioned in #28.

FiV0 commented 1 year ago

Would you also be interested in a PR with a version for get-range that preserves the order of the keys, because that is essentially my use case. Could be a new function or with some option for get-range that instead of accumulating into a map, accumulates into a sequential structure.

vedang commented 1 year ago

I'd happily review a PR which adds supports for KeySelectors and limit / skip / cursors to get-range (Make it multiple PRs for easy review)

Re: returning keys in order, that is the default behaviour of FDB if I understand correctly, and so it should be happening already. (sorted by byte representation / packing)

FiV0 commented 1 year ago

ReRe: returning keys in order.

You currently accumulating the result of ftr/get-range into a map here https://github.com/vedang/clj_fdb/blob/master/src/me/vedang/clj_fdb/core.clj#L152-L157 that essentially mangles the order returned by ftr/get-range.

vedang commented 1 year ago

Oops, sorry. I replied without looking at the code.

Yes, I don't mind a new function which returns tuples in the correct order, instead of returning a key->value map.

Thinking out loud about how to return the range in the correct order:

  1. It might be useful to add it as a different implementation of fc/get-range controlled by default-opts passed into the function. This will let the user decide whether they want a map back or a vector of [k, v] tuples, and the meaning would be self-explanatory
  2. Another approach could be to use flatland/ordered-map to ensure that the map is ordered correctly.

Let me think through the preferred implementation. Happy to hear your thoughts as well.