Open PoneyClairDeLune opened 7 months ago
Current working draft: https://github.com/PoneyClairDeLune/brandy-breeze-js
.magic:16 = 'BZ' signature/magic number
.version:8 = 'r' for Brandy Breeze
.hundred_k_blocksize:8 = '1'..'9', 'a'..'f' block-size 64 KiB-1 MiB (uncompressed)
.compressed_magic:32 = 'BZrd' for Brandy Breeze chunk data packet
.crc:32 = checksum for this block (BE)
.size:24 = size of this block (VLV 21-bit, BE)
.origPtr:24 = start pointer into BWT for before untransform (VLV 21-bit, BE)
.data = compressed data
.eos_magic:32 = 'BZre' for Brandy Breeze end of stream
.crc:32 = checksum for whole stream
(Unfinished)
Discovered a more space-efficient way of reversing BWT without ever building the supposed impossibly-humongous tables. Others might have already discovered this beforehand, but meh, I still spent some hours on this :P
map[e] = i
Now onwards to a better BWT implementation...
.brz
operates at a maximum BWT block size of 1 MiB, so with the two index arrays (uint32), the space used to store the index maps would take about 8 MiB. An additional MiB is used to store the LZSS sliding window, with yet another additional MiB used to store the quick LZSS fast pointer lookup table (16bit + max 16 options).
The modified KORG encoding could prove quite useful here.
Discovered a more space-efficient way of reversing BWT without ever building the supposed impossibly-humongous tables. Others might have already discovered this beforehand, but meh, I still spent some hours on this :P
1. Create an array with all indexes from the input buffer array. 2. Sort the index array with values from the input. 3. Build a map/array with the sorted index array: `map[e] = i` 4. Iterate through the map with the value gotten as the key of the next iteration, writing corresponding bytes from the input buffer to the output buffer.
Now onwards to a better BWT implementation...
https://github.com/senkevinli/burrows-wheeler-transform Yup, someone already discovered.
Brandy Breeze: 🍾 General-purpose compression algorithm.
Built on top of RLE, BWT, MTF(?) and LZ77/LZSS. Aims at offering a balance between Brotli's speed and XZ's efficiency.
Because bzip3 (
.bz3
) is already used as a name by others, the extension would probably be.brz
. The stream format would be backwards compatible with bzip2, but either cannot compress or decompress the other.General design:
Run-length encode, 3-byte trigger. (?)Move-to-front transform. (?)