mpenet / alia

High performance Cassandra client for clojure
https://mpenet.github.io/alia/qbits.alia.html#docs
246 stars 47 forks source link

Reading entire table lazily #91

Closed smoes closed 6 years ago

smoes commented 6 years ago

I saw the lazy-query function but am not really sure how to use it. The example given is very special. As I understood I'd have to merge or refine a query each iteration. I have a huge database and when altering the query and add constraints to it, it'll become quite large. Not sure if I got that wrong.

What is the idiomatic way to execute an (hayt/select :foo) (select * from foo) in a lazy manner/paginated/chunked?

mpenet commented 6 years ago

If you just want to get a lazy stream over a table you can just use qbits.alia/execute with "select * from foo", it will be chunked by default and the driver does it in a streaming fashion with decent defaults. You can also pass :fetch-size to qbits.alia/execute so that you control how many rows the driver will ask at a time to the server as you consume the sequence.

If you need super fine grained control and you want to pair that with core async/manifold there's also qbits.alia.async/execute-chan-buffered which would stream row per row in a channel, with proper back-pressure you can control via core.async takes and underlying pages size fetched controlled via either the core async buffer size of :fetch-size that you can pass as an option. Same for manifold more or less via qbits.alia.manifold/execute-buffered.

mpenet commented 6 years ago

lazy-query is more if you want to run a query and infer something with the result of that query to run another one and so on, in a lazy way. It used to be quite useful in the past but to be honest the use case is quite narrow. I actually pondered about removing it recently, it's still here for backward compatibility.

mpenet commented 6 years ago

Lastly if you want the opposite, make it eager, potentially running an xform on the rows as they are consumed you can use :result-set-fn option on execute:

To quote the doc:

:result-set-fn : Defaults to clojure.core/seq By default a result-set is a chunked lazy seq, you can control this using this option. If you pass a function that supports IReduceInit you can have full control over how the resultset is formed (chunked, unchunked, eager or not, etc). A common use is to pass #(into [] %) as result-set-fn, you then get an eager value, with minimal copies, no intermediary seq and potentially better performance. This can be very powerful when used right (for instance with transducers #(into [] xform %)).

smoes commented 6 years ago

Oh wow, that's a quick answer. Thank you very much! Didn't notice that it is already chunked. That said, I'm already done with my task and can go straight to testing :) What is the default :fetch-size? I assume it doesn't matter if I use a plain query or hayt?

Ahh and now that you describe it, lazy-query actually makes sense. Seemed not so intuitive for what I tried to accomplish ;)

Good work, keep it up! :+1:

mpenet commented 6 years ago

The default fetch-size is 5k if I recall correctly.