Closed brosenan closed 5 years ago
scan
will not actually scan the whole table. It will return a (strict) array of elements, with some metadata. One of the meta-data key is last-prim-kvs
. If this is not null, it means the scan isn't complete and you'll have to call scan
again supplying this last-prim-kvs
to continue scanning.
I have code looking a bit like that:
(defn scan-table
[callback]
(loop [kvs :none]
(let [query (if (= :none kvs) {} {:last-prim-kvs kvs})
results (faraday/scan opts table-name query)
next-kvs (:last-prim-kvs (meta results))
]
(callback results)
(when-not (nil? next-kvs) (recur next-kvs)))))
Don't forget to eventually add a :limit
option and use a rate limiter to avoid blowing up through your provisionned capacity.
Here's a similar approach to @geekingfrog, but operating semi-lazily:
(defn lazy-scan
([client-opts table] (lazy-scan client-opts table nil))
([client-opts table opts]
(lazy-seq
(let [results (faraday/scan client-opts table opts)
next-kvs (:last-prim-kvs (meta results))]
(if next-kvs
(lazy-cat results (lazy-scan client-opts table (assoc opts :last-prim-kvs next-kvs)))
results)))))
I say semi-lazily, because production will stay ahead of consumption, but in chunks related somewhat to whatever initial :limit
value is specified in opts
.
@brosenan and @rwilson are we satisfied here? Can this one be closed? Do you suggest something be adding to the docs around scan?
I think it can be closed; it's reasonably well documented already via the AWS docs and the :limit
parameter.
Sounds good @rwilson
I think that the original poster's question have been misunderstood. He went to read the source code, found the function merge-more
, expressed concerns about what it does, but nobody answered about that function w.r.t. lazyness.
@brosenan To get just a piece of the query's result, you need to specify {:limit n, :span-reqs {:max m}}
in the options. You will get up to (* n m)
items in your results, i.e. m
requests of up to n
items concatenated together in the result.
By default, (-> options :span-reqs :max)
is set to 5 in the current version of Faraday (1.9.0
).
Is the
scan
function safe to use with extremely large tables? I see from the code that it handles pagination by concatenating paginated results into a single sequence usingmerge-more
, but I couldn't figure out from reading the code whether this sequence is lazy.If this sequence is not lazy there are of-course two major drawbacks when using it with large tables:
I believe the sequence is lazy, just wanted to make sure.
Thanks, Boaz