Open bignaux opened 3 years ago
I have done some playing around and concluded that to 'fix' this would require changing the code to avoid formatting each line until it is determined that the line should be written to the output. That change would make xxd about 2 time slower than hexdump for this particular case.
My personal view is that this use-case is too special to warrant changing xxd, but I am willing to make the changes if others feel that we should, in order to be able to close this issue.
Running into slowness problems with reverse mode (-r
) using xxd for a niche case.
I'm trying to send a bitlockered Windows 10 NVMe block device to another machine over ssh with a 1GBPS link between them. Naturally the bitlocker partition is incompressible and as such using gzip
pipes anywhere in the chain or ssh -C
(which is also gzip-powered) causes a major slowdown (>30-40MB/s) due to the overhead of trying to compress the incompressible data in flight and getting no returns.
But there's a lot of zeroed/trimmed unused space in various sections on the disk especially towards the end of the source NVMe drive where the space has never been used. So I went looking for a Zero-length encoding (Run-length encoding) solution which doesn't seem to readily exist as a Linux command for binary data and xxd -a
provides this solution by skipping repeating chunks in its output. This output can also be achieved with hexdump, but xxd -a
is faster when dealing with both binary data and repeating data in the same source.
This solution fits perfectly for Run-length encoding allowing me to send the binary dat can send the incompressible binary stream to a remote while still truncating repeating blocks of zeros throughout the disk. xxd -a
is fast enough to saturate a 1GBPS link getting about 250MB/s in hex-dumping speed.
But xxd -r
lets me down with a significantly slower conversion of the hexdumped data back to binary on the receiving side at a speed closer to ~30MB/s. This can be quickly demonstrated with pv
:
xxd -a /dev/urandom | xxd -r | pv >/dev/null
37.5MiB 0:00:01 [37.6MiB/s]
I had a suspicion this may be the result of line-buffered processing for xxd -r
and that may be true to an extent as changing -c
to the maximum value of 256 seems to increase xxd -r
's speed to about ~60MB/s. But xxd cannot exceed 256 characters per line by default.
I'm using xxd to write big image and test regression on a few software (pfsshell & hdl-dump). The image of 8GB is created with this script https://github.com/bignaux/pfsshell/blob/mkpart/tests/pfsshelltest.tcl . Basicly, it's just few APA partition and zeros.
So here, using hexdump take 20 seconds, using xxd for same things (but the two are incompatibles), it took 20minutes. I think i can qualify that of issue.
To Reproduce
wget https://gist.githubusercontent.com/bignaux/895e658101e7f26bc3ef1238077661ec/raw/5140a62e292df5d07cd0567ed8179033c2cebb42/test.hex
perform a revert xxd (fast):time xxd -r test.hex test.img
then you canxxd -a test.img test.xxd
Expected behavior Have execution time in same order than hexdump to use it in github action for example.
versions: xxd V1.10 27oct98 by Juergen Weigert vim-8.2.1522 nixos 20.09