Closed pete4abw closed 2 years ago
Does a3382bb fix the issue?
No. New error, and decompression continues to fail..
Fill_buffer stream 1 c_len 5,147,508 u_len 33,550,336 last_head 57,589,094
Starting thread 9 to decompress 5,147,508 bytes from stream 1
internal error: out of thread states.Fatal error - exiting
Can't reproduce:
0 [19:55] Desktop/workspace/lrzip-next@lzma-22.01 % src/lrzip-next -B -L1 -vv lrzn.tar
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 16
Detected 15,534,166,016 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: /tmp/
Compression mode is: BZIP3. LZ4 Compressibility testing enabled
Compression level 1
RZIP Compression level 1
BZIP3 Compression Block Size: 5
MD5 Hashing Used
Heuristically Computed Compression Window: 98 = 9,800MB
Storage time in seconds 1,386,671,352
File size: 31,764,480
Succeeded in testing 31,764,480 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 31,764,480
Byte width: 4
Threads reduced to 12
Per Thread Memory Overhead is 201,326,592
Succeeded in testing 2,447,683,584 sized malloc for back end compression
Using up to 12 threads to compress up to 10,485,760 bytes each.
Beginning rzip pre-processing phase
hashsize = 131,072. bits = 17. 2MB
Total: 8% Chunk: 8%
Starting sweep for mask 15
Total: 9% Chunk: 9%
Starting sweep for mask 31
Total: 28% Chunk: 28%
Starting sweep for mask 63
Total: 73% Chunk: 73%
Starting thread 0 to compress 10,485,760 bytes from stream 1
Total: 77% Chunk: 77%
lz4 testing OK for chunk 10,485,760. Compressed size = 63.20% of test size 10,485,760, 1 Passes
Total: 78% Chunk: 78%
Starting bzip3 (bs=5) backend...
Total: 79% Chunk: 79%
Starting sweep for mask 127
Total: 99% Chunk: 99%
87,380 total hashes -- 881 in primary bucket (1.008%)
Malloced 5,178,052,608 for checksum ckbuf
Starting thread 1 to compress 119,370 bytes from stream 0
Starting thread 2 to compress 6,437,089 bytes from stream 1
lz4 testing OK for chunk 119,370. Compressed size = 81.35% of test size 119,370, 1 Passes
Starting bzip3 (bs=5) backend...
lz4 testing OK for chunk 6,437,089. Compressed size = 45.58% of test size 6,437,089, 1 Passes
Starting bzip3 (bs=5) backend...
Writing initial chunk bytes value 4 at 20
Writing EOF flag as 1
Writing initial header at 26
Compthread 0 seeking to 22 to store length 4
Compthread 0 seeking to 26 to write header
Thread 0 writing 5,751,546 compressed bytes from stream 1
Compthread 0 writing data at 39
Compthread 1 seeking to 9 to store length 4
Compthread 1 seeking to 5,751,585 to write header
Thread 1 writing 58,562 compressed bytes from stream 0
Compthread 1 writing data at 5,751,598
Compthread 2 seeking to 35 to store length 4
Compthread 2 seeking to 5,810,160 to write header
Thread 2 writing 1,885,727 compressed bytes from stream 1
Compthread 2 writing data at 5,810,173
MD5: f6b1057e006a7f4863409a0231e2783c
matches=12,284 match_bytes=14,841,631
literals=11,126 literal_bytes=16,922,849
true_tag_positives=25,528 false_tag_positives=68,539
inserts=306,596 match 0.877
lrzn.tar - Compression Ratio: 4.127. bpb: 1.938. Average Compression Speed: 30.000MB/s.
Total time: 00:00:01.53
0 [19:56] Desktop/workspace/lrzip-next@lzma-22.01 % src/lrzip-next -d lrzn.tar.lrz -o mtp
Output filename is: mtp
Validating file for consistency...[OK]
100% 30.29 / 30.29 MB0 / 30.29 MB
Average DeCompression Speed: 15.000MB/s
MD5:f6b1057e006a7f4863409a0231e2783c
Output filename is: mtp: [OK] - 31,764,480 bytes
Total time: 00:00:01.41
The internal error that you have posted occurs only if more threads are launched than control->threads
. This should not happen, I assume.
The internal error that you have posted occurs only if more threads are launched than
control->threads
. This should not happen, I assume.
The issue was you did not account for the thread used by rzip. Small changes to setup_states lock_state unlock_state
in for loops.
index 3cd06ec..6460e7a 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -165,12 +165,12 @@ static int * statequeue = NULL;
static void setup_states(rzip_control * control) {
int i;
- states = malloc(sizeof(struct bz3_state *) * control->threads);
- statequeue = malloc(sizeof(int) * control->threads);
- memset(statequeue, 0, sizeof(int) * control->threads);
+ states = malloc(sizeof(struct bz3_state *) * control->threads + 1);
+ statequeue = malloc(sizeof(int) * control->threads + 1);
+ memset(statequeue, 0, sizeof(int) * control->threads + 1);
if(!states)
fatal("Failed to allocate memory for bzip3 states\n");
- for (i = 0; i < control->threads; i++) {
+ for (i = 0; i < control->threads + 1; i++) {
states[i] = bz3_new((1 << control->bzip3_bs) * ONE_MB);
if(!states[i])
fatal("Failed to allocate %dMB bzip3 state #%d.\n", (1 << control->bzip3_bs), i);
@@ -180,7 +180,7 @@ static void setup_states(rzip_control * control) {
static struct bz3_state * lock_state(rzip_control * control) {
lock_mutex(control, &bz3_statemutex);
int i;
- for (i = 0; i < control->threads; i++) {
+ for (i = 0; i < control->threads + 1; i++) {
if (!statequeue[i]) {
statequeue[i] = 1;
unlock_mutex(control, &bz3_statemutex);
@@ -195,7 +195,7 @@ static struct bz3_state * lock_state(rzip_control * control) {
static void unlock_state(rzip_control * control, struct bz3_state * state) {
lock_mutex(control, &bz3_statemutex);
int i;
- for (i = 0; i < control->threads; i++) {
+ for (i = 0; i < control->threads + 1; i++) {
if (states[i] == state) {
statequeue[i] = 0;
unlock_mutex(control, &bz3_statemutex);
Now we can move forward. However, I do not understand why you have to lock states for bzip3 when each thread is already locked. Can't the functions bzip3_compress_buf
and bzip3_decompress_buf
have their own local state structures - one for each thread? This would greatly simplify the codebase and keep each compress
and decompress
function similar.
@pete4abw This fix is not correct. Consider parenthesising (control->threads + 1)
, otherwise multiplication will be performed first, leading to UB on access to the last element.
@pete4abw This fix is not correct. Consider parenthesising
(control->threads + 1)
, otherwise multiplication will be performed first, leading to UB on access to the last element.
Yup. Fixed. However, I did not notice any problem without the () which I find odd. Anyway, as I wrote earlier, it would be better if we could preserve the handling of all the _compress_buf
and _decompress_buf
without the bzip3 helper functions.
Will keep testing.
According to the C standard, the chunk of memory allocated by malloc()
is only guaranteed to be as many bytes as you ask for, but in practice most implementations give you a little extra for padding or the purposes of the memory manager.
lrzip-next Version
9.2+ (bzip3_poc branch)
lrzip-next command line
lrzip-next -t | d file.lrz
What happened?
When compressing with
-B
file compresses fine, but fails on test | decompressionWhat was expected behavior?
File passes test or decompressess
Steps to reproduce
Obvious.
Relevant log output
Mem: 15Gi 1.4Gi 4.0Gi 477Mi 10Gi 13Gi Swap: 15Gi 34Mi 15Gi