soxofaan / dahuffman

Python Module for Huffman Encoding and Decoding
MIT License
70 stars 14 forks source link

Is it possibile to define a codec from a table of prefixes? #11

Open RigacciOrg opened 3 years ago

RigacciOrg commented 3 years ago

Is it possibile to define a codec starting from a pre-made table of symbols, prefixes and values? I don't have frequencies, etc., I have just a table like the one attached below.

I inspected the pickle objects provided for predefined frequency tables (json, xml, etc.); I think that I can manage to create the code_table part, but cannot figure how to build type, concat and metadata. It would nice if I can just declare a dictionary or something like this in the code, instead of integrating a pickle object into the library.

ECG default Huffman table

P.S. the table above is the default Huffman table used to compress electrocardiography data using the SCP-ECG standard.

soxofaan commented 3 years ago

yes it's possible when you use PrefixCodec, which is the parent class of HuffmanCodec. The latter actually just takes care of converting the frequency table to a prefix code table, the former takes care of the prefix code encoding and decoding.

when you have for example this code table (based on your screenshot):

symbol bits value
1 1 0 (0)
2 3 4 (100)
3 3 5 (101)
4 4 12 (1100)
5 4 13 (1101)
6 5 28 (11100)

you can build a codec like this:

from dahuffman.huffmancodec import PrefixCodec

table = table = {
    1: (1, 0),
    2: (3, 4),
    3: (3, 5),
    4: (4, 12),
    5: (4, 13),
    6: (5, 28),
}

codec = PrefixCodec(table, eof=6)

encoded = codec.encode([1,2,3,4,5,1,2,3,4,5,1,2,3,4,5])
print(codec.decode(encoded))

A problem might be that the current implementation requires you to have an "end of file" (eof) symbol in the table, which is used to mark the end of the bit stream when it does not align properly with byte boundaries. In this example I used symbol 6 as eof.