Open arj03 opened 2 years ago
Part | Speed | Heap Change | Samples |
---|---|---|---|
Count 1 big index (3rd run) | 0.45ms ± 0.2ms | 18.33 kB ± 19.81 kB | 50 |
Create an index twice concurrently | 593.75ms ± 6.74ms | -33.13 kB ± 56.61 kB | 92 |
Load core indexes | 1.32ms ± 0.02ms | 97.3 B ± 224.57 B | 6960 |
Load two indexes concurrently | 730.78ms ± 14.76ms | -23.32 kB ± 266.25 kB | 17 |
Paginate 10 results | 24.92ms ± 1.02ms | 19.38 kB ± 13.07 kB | 27 |
Paginate 20000 msgs with pageSize=5 | 5916.4ms ± 150.89ms | 2.61 MB ± 1.91 MB | 5 |
Paginate 20000 msgs with pageSize=500 | 556.99ms ± 29.87ms | -523.32 kB ± 796.37 kB | 20 |
Query 1 big index (1st run) | 824.39ms ± 9.06ms | 26.18 kB ± 58.09 kB | 66 |
Query 1 big index (2nd run) | 257.78ms ± 0.85ms | 67.36 kB ± 51.14 kB | 52 |
Query 3 indexes (1st run) | 870.22ms ± 8.33ms | 24.72 kB ± 78.52 kB | 63 |
Query 3 indexes (2nd run) | 210.77ms ± 1.2ms | 36.53 kB ± 168.67 kB | 54 |
Query a prefix map (1st run) | 255.5ms ± 31.03ms | -16.3 kB ± 248.32 kB | 24 |
Query a prefix map (2nd run) | 11.68ms ± 1.01ms | 4.39 kB ± 24.52 kB | 21 |
I don't quite understand the need for this but trust you.
to be clear, the way I get the 5 most recently updated eg podcasts is pull for type podcast (getting updates and roots). then run this through crut.read and throw out the duds. once I have 5 good ones kill the stream.
oh, and we have a pull.unique in there too. is this what groupBy would be helping with.?
to be clear, the way I get the 5 most recently updated eg podcasts is pull for type podcast (getting updates and roots). then run this through crut.read and throw out the duds. once I have 5 good ones kill the stream.
oh, and we have a pull.unique in there too. is this what groupBy would be helping with.?
It is the same algorithm, now that you explain it. So right now this is probably only marginally faster. It does maybe improve the discoverability of these kind of things.
I do have an idea to use an index for this, then it should be faster. So challegence accepted :-) Changing this to draft.
Background
@mixmix was working on adding an option to crut list where you would get results ordered by updateTime. We would like to avoid running through all the results (db1 has to do this) in order to display say the latest 5 buttcasts. For this what we really need is a group by operator similar to what you have in a normal database. So this is quite general, you could also use this functionality to show the latest post for each feed. I tried modelling this similar to the seek function. This means we still have to run through results to extract what we need, but since we are working directly on the bipf data, it's not really that bad. This can still be combined with regular where filtering and when used with limit, it will stop once enough results are found.