Open SBlechmann opened 3 years ago
Hello @SBlechmann, thanks for using QuantumLeap :-)
Did you try using the offset
parameter? In principle you should be able to paginate results, so that you can fetch e.g. pages of 10,000 results at a time as in:
...? offset=0 & limit=10000 ... # fetch the first 10,000 query results
...? offset=10001 & limit=20000 ... # fetch the next 10,000
...
One thing to keep in mind with pagination though is QuantumLeap doesn't snapshot query pages, so there's no guarantee the results fetched by a page will be consistent w/r/t the initial query. In more detail, call q
the query predicate and D
the device table it targets so that the query result set at the initial time t0
when you first run the query is
R( t0 ) = { q(r) | r ∈ D at time t0 } = { r0, r1, ..., r[n] }
Now a "page" of R( t0 )
is a subset
P(t0, k, k+m+1) = { r[k], r[k+1], ..., r[k+m] } # k = offset, k+m+1 = limit
So far so good. Except this isn't an accurate model for what QuantumLeap actually does. In fact, every time you fetch a page, QuantumLeap runs again q
against D
, recomputing the result set. So if t1 > t0
, there's no guarantee
P(t0, k, k+m+1) = P(t1, k, k+m+1)
Is this a train smash in practice? In my opinion it isn't, as long as you use a toDate
query parameter to make q
likely to return the same result set every time it gets run
...? toDate=2021-01-01T00:00:00+00:00 & offset=10 & limit=100
Under the (fair?) assumption you seldom delete records (this is a historical record set after all), then pagination should be consistent across time, i.e. if t < u
are two points in time, then P(t, ...) = P(u, ...)
.
One last word of caution. We still need to work on improving the query subsystem. See e.g.
Hey @c0c0n3,
sorry for the late response and thanks for your fast answer and your suggestion for the well explained problem :) I just found this issue where you explained the behavior of the limit parameter. Maybe we could add this to the docs or was I just to blind to see it in the first place?
Yes, I pretty much think the toDate parameter together with the offset (which I haven't known till now) solve the issue, thanks! Still, Is there a way to check how many entries there would actually be in order to see how often I have to query with the suggested solution? Otherwise I'd just check whether the length of the index array is <=10.000.
Hey @SBlechmann
Maybe we could add this to the docs or was I just to blind to see it in the first place?
No, you're dead right Sir :-) thanks so much for pointing it out. I've opened an issue about it: #419
@c0c0n3 @SBlechmann Unfortunately, the response in this issue is incorrect and does not describe the actual behavior of the API.
the parameter limit
does only restrict the number of records returned in one request. This does represent a LIMIT command in SQL
the offset
parameter does only move the selection of records and moves only the starting record. Important to note that the usual direction of the records start from the oldest to the newest. In SQL this would be in SQL:
SELECT * FROM <yourTable> ORDER BY `timeIndex` ASC OFFSET <yourOffset>;
Hence, in combination of both we only need to raise the offset and not the limit:
SELECT * FROM <yourTable> ORDER BY `timeIndex` ASC OFFSET <yourOffset> LIMIT <yourLimit>;
Be careful if lastN
is used because this will revert the ORDER BY
direction.
SELECT * FROM <yourTable> ORDER BY `timeIndex` DESC OFFSET <yourOffset> LIMIT <yourLimit>;
Offset will ignore everything before. Hence, in the example it should be raised in 10000 steps.
I took me hours to figure this out :/
Here is helpful link to understand the actual behavior: https://www.sqltutorial.org/sql-limit/
Therefore, I would suggest to reopen this issue!
Cheers
@tstorek thanks so much for this. I'm reopening the issue so we can look at it again. Can you give us an example of where the API is broken? no need to use 10,000 records, 2 per page would do I guess? thanks alot!!
Describe the bug When querying QL's API, e.g. an attribute of an entity: According to the docs (well, I refer to these since the link in the docs doesn't seem to work) the limit parameter can be set to limit the output. The default value is 10.000. So far so good, if I query without any query paramter I get up to 10.000 values, if I query "[...]?limit=10" I get only 10 values but 10.000 seems to be the maximum value handled by QL. "[...]?limit=10.001" gives the same amount of data than with 10.000.
To Reproduce Steps to reproduce the behavior: Push more than 10.000 values for one attribute to cratedb. Query QL for that attribute with and without the limit parameter.
Expected behavior Since I would like to query for more than 10.000 (e.g. 100.000 values or even all saved values) I expect the limit parameter to work for values greater than 10.000.
Environment (please complete the following information):