pete4abw / lrzip-next

Long Range Zip. Updated and Enhanced version of ckolivas' lrzip project. Lots of new features. Better compression. Actively maintained.
https://github.com/pete4abw/lrzip-next
GNU General Public License v2.0
50 stars 10 forks source link

:lady_beetle: bzip3_poc possible data type mismatch #96

Closed pete4abw closed 1 year ago

pete4abw commented 1 year ago

lrzip-next Version

0.9.3

lrzip-next command line

N/A

What happened?

bzip3 expects size data to be compressed and returns size of compressed to be int32_t yet lrzip-next cthread structure stores compressed and uncompressed data sizes as int64_t. While unlikely that the size of a single block will exceed 2GB, it is not impossible, lrzip-next supports only 64-bit processors now. For bzip3 to work consistently, either type casting or error checking has to be introduced. It appears bzip2 compression/decompression may also be impacted, although it appears to use uint32_t. Some re-evaluation of the use of int64_t in compression/decompression may need to be done.

  74 static struct compress_thread {
  75         uchar *s_buf;   /* Uncompressed buffer -> Compressed buffer */
  76         uchar c_type;   /* Compression type */
  77         i64 s_len;      /* Data length uncompressed */
  78         i64 c_len;      /* Data length compressed */
  79         cksem_t cksem;  /* This thread's semaphore */
  80         struct stream_info *sinfo;
  81         int streamno;   
  82         uchar salt[SALT_LEN];
  83 } *cthreads;
109 /* ** LOW LEVEL APIs ** */                                                                                                                       
110 
111 /**
112  * @brief Encode a single block. Returns the amount of bytes written to `buffer'.
113  * `buffer' must be able to hold at least `bz3_bound(size)' bytes. The size must not
114  * exceed the block size associated with the state.
115  */
116 BZIP3_API int32_t bz3_encode_block(struct bz3_state * state, uint8_t * buffer, int32_t size);
117 
118 /**
119  * @brief Decode a single block.
120  * `buffer' must be able to hold at least `orig_size' bytes. The size must not exceed the block size
121  * associated with the state.
122  * @param size The size of the compressed data in `buffer'
123  * @param orig_size The original size of the data before compression.
124  */
125 BZIP3_API int32_t bz3_decode_block(struct bz3_state * state, uint8_t * buffer, int32_t size, int32_t orig_size);

What was expected behavior?

No data type mismatches

Steps to reproduce

Review source in lrzip_private.h, stream.c.

Relevant log output

No response

Please provide system details

64 bit systems

Additional Context

Type matching is important to prevent possible data overflows.

kspalaiologos commented 1 year ago

The following APIs let you compress buffers of size greater than 2GB:

/**
 * @brief Compress a block of data. This function does not support parallelism
 * by itself, consider using the low level `bz3_encode_blocks()` function instead.
 * Using the low level API might provide better performance.
 * Returns a bzip3 error code; BZ3_OK when the operation is successful.
 * Make sure to set out_size to the size of the output buffer before the operation;
 * out_size must be at least equal to `bz3_bound(in_size)'.
 */
BZIP3_API int bz3_compress(uint32_t block_size, const uint8_t * in, uint8_t * out, size_t in_size, size_t * out_size);

/**
 * @brief Decompress a block of data. This function does not support parallelism
 * by itself, consider using the low level `bz3_decode_blocks()` function instad.
 * Using the low level API might provide better performance.
 * Returns a bzip3 error code; BZ3_OK when the operation is successful.
 * Make sure to set out_size to the size of the output buffer before the operation.
 */
BZIP3_API int bz3_decompress(const uint8_t * in, uint8_t * out, size_t in_size, size_t * out_size);
pete4abw commented 1 year ago

The following APIs let you compress buffers of size greater than 2GB:

Not sure this helps. The high level APIs call bz3_en/decode_block which are called directly from lrzip-next. I'm just concerned with the possible dt mismatch. Is this an issue or not?

pete4abw commented 1 year ago

No action