mynotwo / A-Fast-Transformer-based-General-Purpose-LosslessCompressor

This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Purpose LosslessCompressor".
24 stars 4 forks source link

Does it need to execute the transformer model to decompress files? #1

Open suntong30 opened 1 year ago

suntong30 commented 1 year ago

Hello, I am interested in this work, but I am not familiar with DNN-based compressors. I know that we need the transformer model to compress the data.

In standard compressors, such as zlib and 7zip, utilize deflate and inflate algorithms to compress and decompress. So, in the DNN-based compressors, do we also need the model to decompress the compressed data?

mynotwo commented 1 year ago

Yes, you'll need the model to predict the exact same probability when decompress.

suntong30 commented 1 year ago

Hello, I have the following question, is the model only trained on the input data? For example, if the input file is "dickens", the model reads the "dickens" and trains on it to minimize the loss function. It doesn't need pre-trained. Am I right?

Best regards!

mynotwo commented 1 year ago

Yes! But if you want higher compression ratio, you can pre-train on dickens for several epochs, and then do the compression. The compression ratio report in my paper does not include pre-training.

BitCalSaul commented 1 month ago

@mynotwo hi thanks for your work. im wondering the way you handle with the unpredictable data. even if the model is quite big, there is still some data that your model may handle not well. so to keep this compressor lossless, how do you manage this part of data? thx