tomerfiliba-org / reedsolomon

⏳🛡 Pythonic universal errors-and-erasures Reed-Solomon codec to protect your data from errors and bitrot. Includes a future-proof zero-dependencies pure-python implementation 🔮 and an optional speed-optimized Cython/C extension 🚀
http://pypi.python.org/pypi/reedsolo
Other
358 stars 86 forks source link

Code does not work with an input of class 'bytes' when c_exp > 8 #46

Closed another-pjohnson closed 1 year ago

another-pjohnson commented 2 years ago

example code:

    import reedsolo
    rsc = reedsolo.RSCodec(12, c_exp=12) # same as nsize=4095
    str_msg = "This is a message"
    bytes_msg = b"This is a binary message"
    breakpoint()
    result = rsc.encode(str_msg) # this works
    result_b = rsc.encode(bytes_msg) # this fails

Error message:

Traceback (most recent call last):
  File "rs_tensor.py", line 120, in <module>
    result_b = rsc.encode(bytes_msg) # this fails
  File "reedsolo.py", line 893, in encode
    enc.extend(rs_encode_msg(chunk, self.nsym, fcr=self.fcr, generator=self.generator, gen=self.gen[nsym]))
  File "reedsolo.py", line 526, in rs_encode_msg
    lcoef = gf_log[coef] # precaching
IndexError: array index out of range

The problem seems to stem from the fact that the code in _bytearray has the following fall-through code (if obj is not a str or int)

278             # Else obj is a list of int, it's ok
279             return array("i", obj)

When in reality, obj is of type bytes.

Proposed fix:

        def _bytearray(obj = 0, encoding = "latin-1"):
            '''Fake bytearray replacement, supporting int values above 255'''
            # always use Latin-1 and not UTF8 because Latin-1 maps the first 256 characters to their bytevalue equivalents. UTF8 may mangle your data (particularly at vale 128)
            if isinstance(obj, str):  # obj is a string, convert to list of ints
                obj = obj.encode(encoding)
                if isinstance(obj, str):  # Py2 str: convert to list of ascii ints
                    obj = [ord(chr) for chr in obj]
                elif isinstance(obj, bytes):  # Py3 bytes: characters are bytes, need to convert to int for array.array('i', obj)
                    obj = [int(chr) for chr in obj]
                else:
                    raise(ValueError, "Type of object not recognized!")
            elif isinstance(obj, int):  # compatibility with list preallocation bytearray(int)
                obj = [0] * obj
            elif isinstance(obj, bytes):
                obj = [int(b) for b in obj]
            # Else obj is a list of int, it's ok
            return array("i", obj)
lrq3000 commented 1 year ago

Thank you so much for your contributions, that's 2 edge case bugs you've found and fixed! Both fixes are now merged. Have a great day!