POC: send/receive/list - Githubissues

Experimental implementation of a client/server model for blk-archive which includes the ability to send (pack), receive (unpack), and list streams from a remote archive over TCP/IP.

Example:

$ blk-archive create -a the_archive_dir
$ blk-archive server -a the_archive_dir > /dev/null 2>&1 &
[1] 3701360

$ blk-archive send -s 127.0.0.1:9876 /tmp/some_file.bin 
Packing /tmp/some_file.bin ... [                                        ] Remaining 0s
Client is connecting to server using 127.0.0.1:9876
elapsed          : 2.412
stream id        : e767a7a94a70fd3b
file size        : 976.56M
mapped size      : 976.56M
total read       : 976.56M
fills size       : 0
duplicate data   : 0
data written     : 976.56M
hashes written   : 4.44M
stream written   : 447
ratio            : 1.00
speed            : 404.88M/s

$ blk-archive list -s 127.0.0.1:9876
e767a7a94a70fd3b 1024000000 Nov 07 24 15:44 some_file.bin

$ blk-archive receive -s 127.0.0.1:9876 --stream e767a7a94a70fd3b --create /tmp/deleteme.bin
speed            : 444.50M/s

Implementation details

I read the notes: https://github.com/jthornber/blk-archive/blob/main/doc/Remote%20Replication.md . There are client/server interfaces, but I didn't base them around sync_channel. I experimented with some different approaches. This should be cleaned up.
Introduce stream orderer which creates order out of a potentially unordered list of map entries during pack/unpack operations and is used to allow chunking/unpacking operations to process ahead of data packing/unpacking of needed data/information from the server side.
Server side is using non-blocking socket IO. Single threaded except for compression thread pool for slab writes. This simplifies the locking model, but may need to be changes to a threaded model to embrace more cores.
Using an enum for remote procedure calls and using bincode crate for serialization, binary protocol in little endian.
Created a Db interface which encapsulates slab handling for both packing/unpacking. Unpack now uses a LRU cache instead of a btree.
The packing operation uses a separate thread to chunk which runs independently of code to send data which is missing from archive. Hashes are batched.
The unpacking operation uses separate threads to unpack the stream, fetch data needed for map entries for data.
Complexity and work done is focused on client side, server side is intentionally simple to reduce demands on it.
During the pack operation, the stream is built on client side until completed, then transferred to server side and added to archive. If at any point it doesn't complete, the stream is discarded.
During the unpack operation, the stream is fully transferred to the client and then processed. Having the server process the stream and send chunks seems very inefficient, especially if the stream has lots of redundant data, duplicates, un-mapped regions etc.
The cuckoo filter size calculations have been replaced with code this auto doubles it's size when we exceed capacity. Not sure how best to handle this in a multi-client environment.
Performance in local operations should be very close to existing code, remote operations should be able to achieve line speeds.
No compression has been added in the transport or CRC checks, packet header can accommodate.

Known issues/limitations

There is a race condition between adding a chunk and fetching a chunk in server which resolves around the time it takes the data to make it's way through thread pool and get written to slab or that the chunk hasn't gotten written to the slab as we are waiting for the slab segment to be > threshold.
Delta packing is commented out, not sure how to handle this yet.
No limits are in place for chunking and unpacking, thus memory consumption can be very large.
Bug found with CDC/chunker, will write up separate bug.
More testing/robustness needed in networking handling, or simply replace it all with some appropriate crate.

Misc.

Updated clap to 4.x

jthornber / blk-archive

POC: send/receive/list #18