nothings / stb

stb single-file public domain libraries for C/C++
https://twitter.com/nothings
Other
26.31k stars 7.69k forks source link

Error 30 on stb_vorbis decode #1482

Closed ma261065 closed 1 year ago

ma261065 commented 1 year ago

Describe the bug When decoding this file https://archive.org/download/gd1991-08-13.sbd.miller.109241.flac16/gd91-08-13d1t02.ogg the decoder errors out with error 30. Other vorbis decoders/players can play this file successfully.

To Reproduce Steps to reproduce the behavior:

  1. Read the first 8192 bytes of the file into a buffer
  2. Call stb_vorbis_open_pushdata with the buffer
  3. Get error 30

Expected behavior Not to get error 30, and be able to continue decoding the file, as other players can.

nothings commented 1 year ago

Very early vorbis files use a different format, which stb_vorbis does not now and will never support.

nothings commented 1 year ago

https://github.com/nothings/stb/blob/master/stb_vorbis.c#L16

ma261065 commented 1 year ago

https://github.com/nothings/stb/blob/master/stb_vorbis.c#L16

Thanks. Is that confirmed to be the issue with this file?

nothings commented 1 year ago

Ha, whoops, no, sorry, I misread the error code. Error 30 normally indicates that the file is corrupt. I've checked the file and it does indeed not match the specification, as the second page of data does not appear where it should, which stb_vorbis reports as corruption. If other readers parse it, then probably it uses some Ogg extension that I'm not aware of.

Without knowing what that extra data is, I can't parse it.

ma261065 commented 1 year ago

I appreciate you looking at this Sean. If this extra information is not relevant to decoding the file, is there a way we could just skip it? In other words, do we even need to parse it, or can we just ignore it?

nothings commented 1 year ago

I need to know how to parse it to skip it. It appears where the second page should be, so I don't know how to locate the second page.

ma261065 commented 1 year ago

I see from RFC3533 that "Ogg also allows but does not require secondary header packets after the bos page for logical bitstreams and these must also precede any data packets in any logical bitstream. These subsequent header packets are framed into an integral number of pages, which will not contain any data packets.

So, a physical bitstream begins with the bos pages of all logical bitstreams containing one initial header packet per page, followed by the subsidiary header packets of all streams, followed by pages containing data packets."

Could the mystery data be this optional subsidiary header packet?

nothings commented 1 year ago

No. Vorbis does have secondary header pages (which is part of the Vorbis encapsulation spec), but as your RFC says, those packets are themselves still grouped into Ogg pages, and what's missing here is the Ogg page header for the second page (which contains the secondary Vorbis header packets).

ma261065 commented 1 year ago

Right, I think I get it. Any chance you could post a quick hex dump here of what's expected, and where it seems to deviate from the standard. I'd like to do a bit more digging.

nothings commented 1 year ago

The first four bytes of any page should be "OggS". The first page should begin at byte 0, the second page should begin at byte 58. The file contains "OggS" at byte 0, but not at byte 58. Instead the file starting at byte 58 contains lots of little clumps of non-zero values and then lots of 0 values in a sort-of-but-not-quite pattern.

The best way to figure out what this is might be to step a debugger through it being loaded by another vorbis decoder.

ma261065 commented 1 year ago

I have looked at the file in a hex editor, and see exactly what you are talking about.

Of note- the exact same issue is reported here for Google's Exo Player: https://github.com/google/ExoPlayer/issues/7230#issuecomment-634048746 They call it "non-contiguous pages", and have worked around it by searching for the next OggS string in the input.

The encoder used seems to be Lavc 52.79.1 - https://github.com/libav/libav. I have no idea how common this encoder is (or how many other apps use this library), but unfortunately it seems that it has this quirk. I note that it does have 382 forks though, which could indicate that was quite popular at some point.

All the magic that results in the overly-long header seems to happen in here: https://github.com/libav/libav/blob/c4642788e83b0858bca449f9b6e71ddb015dfa5d/libavformat/oggenc.c

nothings commented 1 year ago

Ok then, yeah, I'm not going to fix this to work with a broken codec. We've gone down that rabbit hole for stb_image, but I'd prefer not to for vorbis. Use a different vorbis library if you need to support reading every vorbis file. Since we already don't support the very old files that I mentioned in my first replies, stb_vorbis doesn't claim to read every vorbis file anyway.