szatmary / libcaption

Free open-source CEA608 / CEA708 closed-caption encoder/decoder
MIT License
147 stars 61 forks source link

SRT files with ♪ in them cause "ERR: Unrecognized block" #50

Open nabeards opened 5 years ago

nabeards commented 5 years ago

Is it possible to support SRT captions that contain the ♪ (music note)? When running flv+srt with these captions, I get the "ERR: Unrecognized block". It looks like the code would need to be updated around here to fix this: https://github.com/szatmary/libcaption/blob/e8b6261090eb3f2012427cc6b151c923f82453db/src/vtt.c#L228

If you can give me some direction on just exactly where it's failing, I may be able to put together a pull request.

Thanks, nabeards

nabeards commented 5 years ago

I think I found a solution by modifying the utf8_line_length function in utf8.c as follows:

size_t utf8_line_length(const utf8_char_t* data)
{
    size_t len = 0;
    size_t utf8_len = 0;
    size_t i = 0;

    for (len = 0; 0 != data[len]; ++len) {
        if ('\r' == data[len]) {
            if ('\n' == data[len + 1]) {
                return len + 2; // windows
            } else {
                return len + 1; // unix
            }
        } else if ('\n' == data[len]) {
            if ('\r' == data[len + 1]) {
                return len + 2; // riscos
            } else {
                return len + 1; // macos
            }
        }
    }

    for (i = 0; i < len; i++) {
        utf8_len += (data[i] & 0xc0) != 0x80;
    }

    return utf8_len;
}

Thoughts? nabeards