quiet / libcorrect

C library for Convolutional codes and Reed-Solomon
BSD 3-Clause "New" or "Revised" License
378 stars 95 forks source link

Bug in correct_reed_solomon_decode() #20

Open Hiffi opened 6 years ago

Hiffi commented 6 years ago

I have a small problem with this function. If there was no error while the transmission, the last parameter is still empty. To fix this problem, I have to do something like this: err = correct_reed_solomon_decode(rs->encoder, rs->encoded, rs->block_length, rs->recvmsg); if (rs->recvmsg[0] == 0) memcpy(rs->recvmsg, rs->encoded, rs->block_length);

I think this behavior is quite confusing, especially while the parameter is modified, if there was an error!

brian-armstrong commented 6 years ago

Hi @Hiffi

Can you provide a larger code sample/more context? I'm fairly certain the tests test this exact case, and it seems to suggest otherwise https://github.com/quiet/libcorrect/blob/master/tests/rs_tester.c#L99

brian-armstrong commented 6 years ago

See also https://github.com/quiet/libcorrect/blob/master/src/reed-solomon/decode.c#L336

Hiffi commented 6 years ago

That's why Im really confused. I couldn't locate, why this is happening. Don't know if this is just a C++ problem?!

I use this as a helper:

#ifndef _correct_cpp_h
#define _correct_cpp_h

extern "C" {
    #include <correct.h>
}

struct rs_struct{
    size_t const_block_length;
    size_t const_message_length;
    size_t block_length;
    size_t message_length;
    size_t min_distance;
    char *msg;
    uint8_t *encoded;
    correct_reed_solomon *encoder;
    int *indices;
    uint8_t *corrupted_encoded;
    uint8_t *erasure_locations;
    unsigned char *recvmsg;

    rs_struct(size_t block_length, size_t min_distance) {
        this->const_block_length = this->block_length = block_length;
        this->const_message_length = this->min_distance = min_distance;
        this->message_length = block_length - min_distance;
        this->msg = (char*) calloc(message_length, sizeof(char));
        this->encoded = (uint8_t*) malloc(block_length * sizeof(uint8_t));
        this->encoder = correct_reed_solomon_create(correct_rs_primitive_polynomial_ccsds, 1, 1, min_distance);
        this->indices = (int*) malloc(block_length * sizeof(int));
        this->corrupted_encoded = (uint8_t*) malloc(block_length * sizeof(uint8_t));
        this->erasure_locations = (uint8_t*) malloc(min_distance * sizeof(uint8_t));
        this->recvmsg = (unsigned char*) malloc(sizeof(unsigned char) * block_length);      
    }

    ~rs_struct() {}

    void reset() {
        this->block_length = this->const_block_length;
        this->min_distance = this->const_message_length;
        this->message_length = block_length - min_distance;
        memset(this->msg, 0, this->message_length);
        memset(this->encoded, 0, this->block_length);
        memset(this->indices, 0, this->block_length);
        memset(this->corrupted_encoded, 0, this->block_length);
        memset(this->erasure_locations, 0, this->min_distance);
        memset(this->recvmsg, 0, this->block_length);
    }
};

#endif

And then use it like that:

...
int main(int argc, char *argv[]) {
    rs_struct *rs = new rs_struct(255, 32);
    ...

    while(1) {
        if (mySwitch.available()) {
            rs->encoded = (uint8_t*)mySwitch.getReceivedValue();
            cout<<"Empfangen: "<<rs->encoded<<"\n";

            int err = correct_reed_solomon_decode(rs->encoder, rs->encoded, rs->block_length, rs->recvmsg);
            if (rs->recvmsg[0] == 0)
                memcpy(rs->recvmsg, rs->encoded, rs->block_length);
            cout<<"Decodiert: "<<rs->recvmsg<<"(Message Length: "<<err<<" | String Length: "<<strlen((char*)rs->recvmsg)<<"\n";

            ...

            rs->reset();
            mySwitch.resetAvailable();
        }
    }
    exit(0);
}

Overall I have to say, that it works really nice for me. Thanks for that 👍

brian-armstrong commented 6 years ago

Thanks, I hope it works well for you.

The one thing I can think of here is that it won't copy the received message when there are too many errors. Is err -1 when you're seeing an empty recvMsg? I didn't think it'd be useful to copy back a corrupted message, so it doesn't do that, but maybe it should.

Hiffi commented 6 years ago

Code:

memcpy(rs->encoded, (uint8_t*)mySwitch.getReceivedValue(), rs->block_length);
cout<<"Received: "<<rs->encoded<<"\n";

int err = correct_reed_solomon_decode(rs->encoder, rs->encoded, rs->block_length, rs->recvmsg);
cout<<"Decoded: "<<rs->recvmsg<<"(Message Length: "<<err<<" | String Length: "<<strlen((char*)rs->recvmsg)<<")\n";

Output:

Received: test
Decoded: (Message Length: 223 | String Length: 0)

Now comes the crazy part...

Code:

if ( recvfrom_inet_dgram_socket(sfd,rs->encoded,rs->block_length, src_host,sizeof(src_host),src_service,sizeof(src_service),0,LIBSOCKET_NUMERIC) < 0 ){ 
    perror(0);
    exit(1);
}
cout<<"Connection from "<<src_host<<" port "<<src_service<<": "<<rs->encoded<<"\n";

int err = correct_reed_solomon_decode(rs->encoder, rs->encoded, rs->block_length, rs->recvmsg);
cout<<"Decoded: "<<rs->recvmsg<<"(Message Length: "<<err<<" | String Length: "<<strlen((char*)rs->recvmsg)<<")\n";

Output:

Connection from 127.0.0.1 port 55723: 1234
Decoded: 1234(Message Length: 223 | String Length: 4)

The only diffrence is, that getReceivedValue() delivers a char as return. The recvfrom_inet_dgram_socket() wants a void as a parameter.

brian-armstrong commented 6 years ago

Unfortunately I still don't have quite enough code here to see what's going on. Can you come up with a short example that demonstrates this behavior and then provide me with the whole thing? I've reviewed the code in the decoder and can't find a likely explanation for how it could return a positive value but not write to the received message pointer.

Hiffi commented 6 years ago

Okay, it took me some hours, but I think I found the bug.

Code:

#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <unistd.h>
#include <string.h>
#include "correct_cpp.h"

using namespace std;

int main() {
    rs_struct *rs = new rs_struct(255, 32);
    char buf[500] = { 72, 97, 108, 108, 111, 32, 87, 101, 108, 116, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 217, 1, 214,
                        5, 237, 210, 42, 9, 139, 10, 177, 149, 18, 112, 11, 151, 27, 202, 75, 66, 116, 2, 121,
                        145, 199, 123, 108, 53, 222, 90, 92, 121};

    char buf2[500] = { 72, 97, 108, 108, 111, 32, 87, 101, 108, 116, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    memcpy(rs->encoded, buf2, rs->block_length);

    int err = correct_reed_solomon_decode(rs->encoder, rs->encoded, rs->block_length, rs->recvmsg);
        cout<<"Decoded: "<<rs->recvmsg<<"(Message Length: "<<err<<" | String Length: "<<strlen((char*)rs->recvmsg)<<"\n";

    memcpy(rs->encoded, buf, rs->block_length);

    err = correct_reed_solomon_decode(rs->encoder, rs->encoded, rs->block_length, rs->recvmsg);
        cout<<"Decoded: "<<rs->recvmsg<<"(Message Length: "<<err<<" | String Length: "<<strlen((char*)rs->recvmsg)<<"\n";

    delete rs;

    return 0;
}

Output:

Decoded: (Message Length: 223 | String Length: 0
Decoded: Hallo Welt
(Message Length: 223 | String Length: 11

If the message is cut off at the end, the byte for the redundancy are missing (aka zeros). I would assume, that the decoder return -1, because it can not decode anything. But unfortunately it returns the message length.

brian-armstrong commented 6 years ago

This is a pretty fascinating bug report, but ultimately I've decided this is actually the correct behavior.

The first oddity to notice here is that a message that's all 0s has a parity section that's also all 0s (this may not be true for other primitive polynomials/FCR/root gaps, but is true for CCSDS/1/1). That means that a block with all 0s is actually a valid block, and it decodes to a message containing all 0s.

The second issue is that your message is less than 16 characters long. For a block with 32 roots, up to 16 of the 255 bytes can be corrupted entirely and the message can still be recovered. By removing the last 32 bytes of the message and replacing them with 0, it would seem that decode should indeed error out, since 32 is more than the allowable 16.

Instead, decode succeeds but gives you back a payload of all 0s. That's because the block you gave to decode is actually less than 16 bytes away from the valid, all-0s block, and with the bytes repaired, it seems to be a successful decode from Reed-Solomon's point of view.

Unfortunately, with enough bytes replaced, this can happen. I have tried to make a note of this in the comments in correct.h:

In most cases, if the block is too corrupted, this function will return -1 and not perform decoding. It is possible but unlikely that the payload written to msg will contain errors when this function returns a positive value.

It seems you hit one of these unfortunate but possible cases. If you want to reduce the likelihood that this can happen, you may want to add a CRC32 checksum to your message. Reed-Solomon can reject some message failures but not as well a good checksum can.