Open maxux opened 10 months ago
Implementation started on branch development-v2-data-segment
. There is a working version already which can export and import part of data. On local database.
This feature allowed me to clone a full namespace of 31G
(locally) in 1 min 58 seconds without any external tools.
There is two new command (only available for administrator):
DATA EXPORT [dataid] [offset]
DATA IMPORT [dataid] [offset] [data]
When doing EXPORT
, zdb is sending a 4 MB
chunk of data_id:data_offset
in one shot to client. Client can't choose the chunk size, the 4 MB
is a hardcoded size which seems good to avoid locking zdb, take benefit of line bandwidth, is below any hard limit set on redis protocol level and doesn't consume lot of memory.
Import works the same way except that you can only import to the current (last) data_id, you can't import an already closed (immuable) datafile. In addition, this feature is only allowed on frozen namespace
to avoid any side changes. This feature is designed to clone a namespace from scratch, this feature can't be used to clone a similar namespace if data are not exactly the same.
Workflow when importing:
NSSET freeze
)NSINFO
and fetch that from master-EOF
is reached, jump to the next file with NSJUMP
and keep cloning-EOF
is reached and data_id and data_offset are the same than master, data are sync.There is a script which does that already in place: tools/export-import/eximport.py
Next step is getting the index ready. Best solution in my opinion to achieve that is implementing an INDEX REBUILD
based on data files, so index can be created from scratch from data file. There is an issue already talking about that, that would be nice #160.
In order to fast clone one namespace to another instance of zdb, a specific command which would transfert full chunk of data file in one shot would be really important to benefit of full line speed.
There is already
DATA RAW
command which fetch a specific entry based on offset, but that's inefficient when there are lot of small entries.This command would be only for administrators obviously, since it could leak data and slow down process.