Closed deffi420 closed 8 years ago
this change makes sdb a bit slower
in percentages:
For some cases it will be faster to precalculate all the n_i
s and keep it in memory until the sdb instance is gone. But this seems to be a fair exchange between dbsize and performance
The running times are pretty much the same for me (CFLAGS="-O2"). Nonetheless, I'll try to optimize the n_i computation tomorrow.
For 100K rows , the new implementation takes 0.370s vs 0.447s to generate the database. so it means that syncing is faster, which is very important operation because its never updating the disk file. (18% faster)
and with 100K rows, the database is much smaller. 4.1MB vs 3.3MB.
We are missing a test for retrieving, i’ll add it now, so i think the benchmarks clearly show that this thing must be merged :)
Send those optimizations in a separate PR.
On 24 Jan 2016, at 00:51, deffi420 notifications@github.com wrote:
https://cloud.githubusercontent.com/assets/5674089/12533570/5c7bf15a-c23c-11e5-820e-e9f53d093eaa.png The running times are pretty much the same for me (CFLAGS="-O2"). Nonetheless, I'll try to optimize the n_i computation tomorrow.
— Reply to this email directly or view it on GitHub https://github.com/radare/sdb/pull/92#issuecomment-174235228.
It’s obvious, but worthless to say that this change makes databases built with the previous code are no longer accessible. this will break for example the information in projects in r2.
as long as sdb deosnt puts any header in the database, there’s no way to specify a version number or filetype and we cant handle this. But i think we can go forward, and just warn users about this breakage, but i dont think its gonna be a huge problem
On 24 Jan 2016, at 00:51, deffi420 notifications@github.com wrote:
https://cloud.githubusercontent.com/assets/5674089/12533570/5c7bf15a-c23c-11e5-820e-e9f53d093eaa.png The running times are pretty much the same for me (CFLAGS="-O2"). Nonetheless, I'll try to optimize the n_i computation tomorrow.
— Reply to this email directly or view it on GitHub https://github.com/radare/sdb/pull/92#issuecomment-174235228.
I'm afraid there's something wrong. The database should be 1024 bytes smaller, not 15%...
For some reason the new CDB hashmap is 100% full (i.e., collisions occur all the time). We need some empty slots to speed up lookups.
Ok, the problem is here https://github.com/radare/sdb/commit/90a6981271b611dc880ed06d4f913e04476af285#diff-707f52f9224c9ce39cb59909122d1581R150
count is the number of reserved slots in a subtable, and len is the total length of the subtable. We do not want these to be equal because then there is no empty slots and hash collisions occur 100% of the time. So we should revert the change back to: len = count << 1.
My bad...
The trick is to compute the subtable length n_i, 0 <= i <= 255, using the subtable pointers pi and p{i+1}. Observe that
So let's not waste 1024 bytes of space to encode n_i.