Open elee1766 opened 1 year ago
It's maybe-big task. Next things needs to be done:
empty value
in: value, key, value of DupSort table. empty value
it's not nil
. That all API's (Seek, SeekBothRange, Get, Next, First, Last, etc...)- if you put empty value
then return empty value
not nil
. likely need write tests for this special case - because it's easy to break. empty value
nil
-value means delete this key. Need keep this feature. etl doesn't support deletes from DupSort tables. rawdb
package - maybe there are checks like if len(k) == 0
: nil
and empty value
will pass this check. Hm, I will take one step at a time. and update this post as I investigate
* check that mdbx-go bindings do support empty values - maybe not (because LMDB didn't support them). Sub-tasks - can store `empty value` in: value, key, value of DupSort table.
it seems https://github.com/torquem-ch/mdbx-go/blob/master/mdbx/txn.go#L548 does not support empty values. replaces 0 length with a []byte{0}, then passes to Put. it seems in MDBX - reserve.iov_len = (data ? data->iov_len : 0) + sizeof(mdbx_attr_t);
is used, implying that if NULL is passed as data, that would be how to set an empty value. Assuming key is the same
* check that kv_mdbx.go does support empty values
seems it passes through https://github.com/ledgerwatch/erigon-lib/blob/main/kv/mdbx/kv_mdbx.go#L1109, https://github.com/ledgerwatch/erigon-lib/blob/main/kv/mdbx/kv_mdbx.go#L1374 all the way down, so it does support empty values
as for empty keys, seems it is properly handling it, at least here: https://github.com/ledgerwatch/erigon-lib/blob/main/kv/mdbx/kv_mdbx.go#L535. tests will need to be written
TODO:
* check that kv_remote and remotedbserver do support empty values * ensure that `empty value` it's not `nil`. That all API's (Seek, SeekBothRange, Get, Next, First, Last, etc...)- if you put `empty value` then return `empty value` not `nil`. likely need write tests for this special case - because it's easy to break. * etl has 3 collectors all of them need support `empty value` * etl now implemented next way: `nil`-value means delete this key. Need keep this feature. etl doesn't support deletes from DupSort tables. * take a look into `rawdb` package - maybe there are checks like `if len(k) == 0`: `nil` and `empty value` will pass this check.
FYI: you can ignore b.AutoDupSortKeysConversion == true
case
first time using etl just now- I was scratching my head over why Load was not working, it was because i was trying to load empty values, possible with direct transaction, but seems currently seems in ETL framework it is used as the delete identifier.
https://github.com/ledgerwatch/erigon-lib/blob/main/etl/collector.go#L239
One possible solution - add TransformArg "AllowEmptyValues" to skip this check, and allow therefore "zero values"? this is something supported by MDBX. Happy to make a PR if it seems like a good idea.
example use case, filling a database with a large amount of random empty keys for performance benchmark: