pete4abw / lrzip-next

Long Range Zip. Updated and Enhanced version of ckolivas' lrzip project. Lots of new features. Better compression. Actively maintained.
https://github.com/pete4abw/lrzip-next
GNU General Public License v2.0
50 stars 10 forks source link

:lady_beetle: BZIP3 Test | Decompression fails with segfault #86

Closed pete4abw closed 2 years ago

pete4abw commented 2 years ago

lrzip-next Version

9.2+ (bzip3_poc branch)

lrzip-next command line

lrzip-next -t | d file.lrz

What happened?

When compressing with -B file compresses fine, but fails on test | decompression

What was expected behavior?

File passes test or decompressess

Steps to reproduce

Obvious.

Relevant log output

$ lrzip-next -vvt file.lrz
Using configuration file /home/peter/.lrzip/lrzip.conf
The following options are in effect for this INTEGRITY TEST.
Threading is ENABLED. Number of CPUs detected: 8
Detected 16,558,223,360 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Test file integrity
Temporary Directory set as: /tmp/
Malloced 5,519,405,056 for tmp_outbuf
Detected lrzip version 0.9 file.
SHA256 being used for integrity testing.
Validating file for consistency...[OK]
Detected lrzip version 0.9 file.
Decompressing...
Reading chunk_bytes at 20
Expected size: 2,063,063,040
Chunk byte width: 4
Reading eof flag at 21
EOF: 1
Reading expected chunksize at 22
Chunk size: 2,063,063,040
Reading stream 0 header at 27
Reading stream 1 header at 40
Reading ucomp header at 180,964,164
Fill_buffer stream 0 c_len 7,163,384 u_len 14,178,163 last_head 0
Starting thread 0 to decompress 7,163,384 bytes from stream 0
Thread 0 decompressed 14,178,163 bytes from stream 0
Taking decompressed data from thread 0
Reading ucomp header at 53
Fill_buffer stream 1 c_len 6,481,206 u_len 33,550,336 last_head 6,481,245
Starting thread 1 to decompress 6,481,206 bytes from stream 1
Reading ucomp header at 6,481,272
Fill_buffer stream 1 c_len 9,253,807 u_len 33,550,336 last_head 15,735,065
Starting thread 2 to decompress 9,253,807 bytes from stream 1
Reading ucomp header at 15,735,092
Fill_buffer stream 1 c_len 6,955,531 u_len 33,550,336 last_head 22,690,609
Starting thread 3 to decompress 6,955,531 bytes from stream 1
Reading ucomp header at 22,690,636
Fill_buffer stream 1 c_len 6,890,209 u_len 33,550,336 last_head 29,580,831
Starting thread 4 to decompress 6,890,209 bytes from stream 1
Reading ucomp header at 29,580,858
Fill_buffer stream 1 c_len 5,434,365 u_len 33,550,336 last_head 35,015,209
Starting thread 5 to decompress 5,434,365 bytes from stream 1
Reading ucomp header at 35,015,236
Fill_buffer stream 1 c_len 5,287,005 u_len 33,550,336 last_head 40,302,227
Starting thread 6 to decompress 5,287,005 bytes from stream 1
Reading ucomp header at 40,302,254
Fill_buffer stream 1 c_len 6,031,949 u_len 33,550,336 last_head 46,334,189
Starting thread 7 to decompress 6,031,949 bytes from stream 1
Reading ucomp header at 46,334,216
Fill_buffer stream 1 c_len 6,107,371 u_len 33,550,336 last_head 52,441,573
Starting thread 8 to decompress 6,107,371 bytes from stream 1
Reading ucomp header at 52,441,600
Fill_buffer stream 1 c_len 5,147,508 u_len 33,550,336 last_head 57,589,094
Starting thread 9 to decompress 5,147,508 bytes from stream 1
Segmentation fault

### Please provide system details

OS Distro: Slackware x64 Current
Kernel Version (uname -a): 6.0.0
System ram (free -h): 
           total        used        free      shared  buff/cache   available

Mem: 15Gi 1.4Gi 4.0Gi 477Mi 10Gi 13Gi Swap: 15Gi 34Mi 15Gi



### Additional Context

Includes contribution by Kamila Szewczyk (@kspalaiologos) for bzip3 integration.
kspalaiologos commented 2 years ago

Does https://github.com/pete4abw/lrzip-next/pull/85/commits/a3382bb8255a7322d99a4c38e264c09dd72185c7 fix the issue?

pete4abw commented 2 years ago

Does a3382bb fix the issue?

No. New error, and decompression continues to fail..

Fill_buffer stream 1 c_len 5,147,508 u_len 33,550,336 last_head 57,589,094
Starting thread 9 to decompress 5,147,508 bytes from stream 1
internal error: out of thread states.Fatal error - exiting
kspalaiologos commented 2 years ago

Can't reproduce:

 0 [19:55] Desktop/workspace/lrzip-next@lzma-22.01 % src/lrzip-next -B -L1 -vv lrzn.tar
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 16
Detected 15,534,166,016 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: /tmp/
Compression mode is: BZIP3. LZ4 Compressibility testing enabled
Compression level 1
RZIP Compression level 1
BZIP3 Compression Block Size: 5
MD5 Hashing Used
Heuristically Computed Compression Window: 98 = 9,800MB
Storage time in seconds 1,386,671,352
File size: 31,764,480
Succeeded in testing 31,764,480 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 31,764,480
Byte width: 4
Threads reduced to 12
Per Thread Memory Overhead is 201,326,592
Succeeded in testing 2,447,683,584 sized malloc for back end compression
Using up to 12 threads to compress up to 10,485,760 bytes each.
Beginning rzip pre-processing phase
hashsize = 131,072.  bits = 17. 2MB
Total:  8%  Chunk:  8%
Starting sweep for mask 15
Total:  9%  Chunk:  9%
Starting sweep for mask 31
Total: 28%  Chunk: 28%
Starting sweep for mask 63
Total: 73%  Chunk: 73%
Starting thread 0 to compress 10,485,760 bytes from stream 1
Total: 77%  Chunk: 77%
lz4 testing OK for chunk 10,485,760. Compressed size = 63.20% of test size 10,485,760, 1 Passes
Total: 78%  Chunk: 78%
Starting bzip3 (bs=5) backend...
Total: 79%  Chunk: 79%
Starting sweep for mask 127
Total: 99%  Chunk: 99%
87,380 total hashes -- 881 in primary bucket (1.008%)
Malloced 5,178,052,608 for checksum ckbuf
Starting thread 1 to compress 119,370 bytes from stream 0
Starting thread 2 to compress 6,437,089 bytes from stream 1
lz4 testing OK for chunk 119,370. Compressed size = 81.35% of test size 119,370, 1 Passes
Starting bzip3 (bs=5) backend...
lz4 testing OK for chunk 6,437,089. Compressed size = 45.58% of test size 6,437,089, 1 Passes
Starting bzip3 (bs=5) backend...
Writing initial chunk bytes value 4 at 20
Writing EOF flag as 1
Writing initial header at 26
Compthread 0 seeking to 22 to store length 4
Compthread 0 seeking to 26 to write header
Thread 0 writing 5,751,546 compressed bytes from stream 1
Compthread 0 writing data at 39
Compthread 1 seeking to 9 to store length 4
Compthread 1 seeking to 5,751,585 to write header
Thread 1 writing 58,562 compressed bytes from stream 0
Compthread 1 writing data at 5,751,598
Compthread 2 seeking to 35 to store length 4
Compthread 2 seeking to 5,810,160 to write header
Thread 2 writing 1,885,727 compressed bytes from stream 1
Compthread 2 writing data at 5,810,173
MD5: f6b1057e006a7f4863409a0231e2783c

matches=12,284 match_bytes=14,841,631
literals=11,126 literal_bytes=16,922,849
true_tag_positives=25,528 false_tag_positives=68,539
inserts=306,596 match 0.877
lrzn.tar - Compression Ratio: 4.127. bpb: 1.938. Average Compression Speed: 30.000MB/s.
Total time: 00:00:01.53
 0 [19:56] Desktop/workspace/lrzip-next@lzma-22.01 % src/lrzip-next -d lrzn.tar.lrz -o mtp
Output filename is: mtp
Validating file for consistency...[OK]
100%      30.29 /     30.29 MB0 /     30.29 MB
Average DeCompression Speed: 15.000MB/s
MD5:f6b1057e006a7f4863409a0231e2783c
Output filename is: mtp: [OK] - 31,764,480 bytes
Total time: 00:00:01.41
kspalaiologos commented 2 years ago

The internal error that you have posted occurs only if more threads are launched than control->threads. This should not happen, I assume.

pete4abw commented 2 years ago

The internal error that you have posted occurs only if more threads are launched than control->threads. This should not happen, I assume.

The issue was you did not account for the thread used by rzip. Small changes to setup_states lock_state unlock_state

in for loops.

index 3cd06ec..6460e7a 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -165,12 +165,12 @@ static int * statequeue = NULL;

 static void setup_states(rzip_control * control) {
        int i;
-       states = malloc(sizeof(struct bz3_state *) * control->threads);
-       statequeue = malloc(sizeof(int) * control->threads);
-       memset(statequeue, 0, sizeof(int) * control->threads);
+       states = malloc(sizeof(struct bz3_state *) * control->threads + 1);
+       statequeue = malloc(sizeof(int) * control->threads + 1);
+       memset(statequeue, 0, sizeof(int) * control->threads + 1);
        if(!states)
                fatal("Failed to allocate memory for bzip3 states\n");
-       for (i = 0; i < control->threads; i++) {
+       for (i = 0; i < control->threads + 1; i++) {
                states[i] = bz3_new((1 << control->bzip3_bs) * ONE_MB);
                if(!states[i])
                        fatal("Failed to allocate %dMB bzip3 state #%d.\n", (1 << control->bzip3_bs), i);
@@ -180,7 +180,7 @@ static void setup_states(rzip_control * control) {
 static struct bz3_state * lock_state(rzip_control * control) {
        lock_mutex(control, &bz3_statemutex);
        int i;
-       for (i = 0; i < control->threads; i++) {
+       for (i = 0; i < control->threads + 1; i++) {
                if (!statequeue[i]) {
                        statequeue[i] = 1;
                        unlock_mutex(control, &bz3_statemutex);
@@ -195,7 +195,7 @@ static struct bz3_state * lock_state(rzip_control * control) {
 static void unlock_state(rzip_control * control, struct bz3_state * state) {
        lock_mutex(control, &bz3_statemutex);
        int i;
-       for (i = 0; i < control->threads; i++) {
+       for (i = 0; i < control->threads + 1; i++) {
                if (states[i] == state) {
                        statequeue[i] = 0;
                        unlock_mutex(control, &bz3_statemutex);

Now we can move forward. However, I do not understand why you have to lock states for bzip3 when each thread is already locked. Can't the functions bzip3_compress_buf and bzip3_decompress_buf have their own local state structures - one for each thread? This would greatly simplify the codebase and keep each compress and decompress function similar.

kspalaiologos commented 2 years ago

@pete4abw This fix is not correct. Consider parenthesising (control->threads + 1), otherwise multiplication will be performed first, leading to UB on access to the last element.

pete4abw commented 2 years ago

@pete4abw This fix is not correct. Consider parenthesising (control->threads + 1), otherwise multiplication will be performed first, leading to UB on access to the last element.

Yup. Fixed. However, I did not notice any problem without the () which I find odd. Anyway, as I wrote earlier, it would be better if we could preserve the handling of all the _compress_buf and _decompress_buf without the bzip3 helper functions.

Will keep testing.

kspalaiologos commented 2 years ago

According to the C standard, the chunk of memory allocated by malloc() is only guaranteed to be as many bytes as you ask for, but in practice most implementations give you a little extra for padding or the purposes of the memory manager.