xiph / flac

Free Lossless Audio Codec
https://xiph.org/flac/
GNU Free Documentation License v1.3
1.58k stars 278 forks source link

UTF-8 decoder appears to be too permissive #706

Closed bemoody closed 1 week ago

bemoody commented 1 month ago

The logic in FLAC__bitreader_read_utf8_uint32 and FLAC__bitreader_read_utf8_uint64 (src/libFLAC/bitreader.c) looks wrong:

    if(!(x & 0x80)) { /* 0xxxxxxx */
        v = x;
        i = 0;
    }
    else if(x & 0xC0 && !(x & 0x20)) { /* 110xxxxx */
        v = x & 0x1F;
        i = 1;
    }
    else if(x & 0xE0 && !(x & 0x10)) { /* 1110xxxx */
        v = x & 0x0F;
        i = 2;
    }
    else if(x & 0xF0 && !(x & 0x08)) { /* 11110xxx */
        v = x & 0x07;
        i = 3;
    }
    else if(x & 0xF8 && !(x & 0x04)) { /* 111110xx */
        v = x & 0x03;
        i = 4;
    }
    else if(x & 0xFC && !(x & 0x02)) { /* 1111110x */
        v = x & 0x01;
        i = 5;
    }
    else if(x & 0xFE && !(x & 0x01)) { /* 11111110 */
        v = 0;
        i = 6;
    }
    else {
        *val = FLAC__U64L(0xffffffffffffffff);
        return true;
    }

This code will accept, for example, 0x96 as a leading byte for a two-byte sequence (for valid UTF-8, it should be 0xD6 instead.)

Is this behavior intentional?

(I noticed this while trying to diagnose a different problem, in that some versions of libFLAC fail to seek to certain sample numbers. I think the UTF-8 decoding issue is not the real cause of my problem, but I think this does increase the likelihood of libFLAC ultimately treating an invalid frame header as valid.)

ktmf01 commented 1 month ago

I don't think this was intentional. A short check for pattern 0b100xxxxxx should fix this.

ktmf01 commented 1 month ago

~Perhaps this would be better? (I haven't checked it yet)~

Nevermind, with this 0x96 would fail at the first else if but get through at the third one of course.


    if(!(x & 0x80)) { /* 0xxxxxxx */
        v = x;
        i = 0;
    }
    else if(x & 0x40 && !(x & 0x20)) { /* x10xxxxx -> 110xxxxx */
        v = x & 0x1F;
        i = 1;
    }
    else if(x & 0x20 && !(x & 0x10)) { /* xx10xxxx -> 1110xxxx */
        v = x & 0x0F;
        i = 2;
    }
    else if(x & 0x10 && !(x & 0x08)) { /* xxx10xxx -> 11110xxx */
        v = x & 0x07;
        i = 3;
    }
    else if(x & 0x08 && !(x & 0x04)) { /* xxxx10xx -> 111110xx */
        v = x & 0x03;
        i = 4;
    }
    else if(x & 0x04 && !(x & 0x02)) { /* xxxxx10x -> 1111110x */
        v = x & 0x01;
        i = 5;
    }
    else if(x & 0x02 && !(x & 0x01)) { /* xxxxxx10 -> 11111110 */
        v = 0;
        i = 6;
    }
    else {
        *val = FLAC__U64L(0xffffffffffffffff);
        return true;
    }
ktmf01 commented 3 weeks ago

@bemoody could you take a look at #712 for me?