Closed ghost closed 5 years ago
Maybe I mean granule. I don't really know.
mp3 have concept of bit-reservoir, i.e. frame can use unused tail bits (because encoder rate control fails to use all bits) for next frame. If first frame become unavailable, decoder can't decode second frame (if bit-reservoir used), but if it's still passed to decoder, bit-reservoir become full and decoder can decode 3-rd frame. The number of frames before should have 511 payload bytes in the worst case to completely fill the reservoir.
That leaves more questions. This is believed to be a frame data (I've stripped 4 bytes). b'\x00\x00\x00\x00\x00i\x06\x00\x00\x00\x00\x00\r \xc0\x00\x00\x00\x00\x01\xa4\x1c\x00\x00\x00\x00\x004\x83\x80\x00\x00LAME3.92UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUULAME3.92UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU' Is it? Doesn't look like audio to me. Well, it may be a heavily compressed zero padded block of zeros which it probably is. Can you explain this simple example to me? So, let me sum everything I've observed. A Layer 3 alike frame returns 1152 samples per channel when decoded. That is 1152 time samples which come from 2304 frequency samples processed by IMDCT. But let us start from the first: I have to perform a Huffman decoding. It seems that Huffman tables are not provided per frame but are part of the MP3 standard. From http://blog.bjrn.se/2008/10/lets-build-mp3-decoder.html : "The largest code table defined in the standard has samples no larger than 15. ". That's unclear. I think that means: "The MP3 standard allows a Huffman table word to be decoded into samples with a value not any larger than 15". When I've first saw this I've thought it's "no more than 15 samples". Also "Somewhat confusingly, the standard calls these possibilities “tables”. We will call them table pairs instead." I'm more confused by these "table pairs". So the Huffman decoding is as follows: get a word, decode it to 2 or 4 samples, for each of them, if the sample is 15 read linbits bits as unsigned and add'em to the sample, if the next bit is true, swap the sign. But the formats(read structs) of the big, quad and zero regions are not covered. I need to know which table is used to encode the region but I have no idea where it is stored. Same for the reordering. Heading to the requant. "The MP3 encoder uses a non-linear quantizer". When I first saw that I've thought about mulaw. No, it's more like elementwise(Hadamard, Schur) vector product. About stereo - I'm only interested in mono and stereo structs. How the stereo frame differs from the mono frame. Back to the IMDCT. So, if I'd have decompressed, amplified and mixed MDCT samples from the current frame in one buffer, and either same from the previous frame or zeros in another buffer, I'd divide both buffers to 32 parts, then create another two new buffers, then for each part I'd zero the new buffers and then write only the current part to them at the original place, for each part, do IMDCT and apply a window. "The value chunkBlockType is an integer, either 0,1,2 or 3. This decides which window is used." Where is chunkBlockType? About the overlap. So, to do IMDCT I'll take current 1152 samples and previous 1152 samples or 1152 zeros, take IMDCT then mix with IMDCT of the previous frame by the window, so I'll mix with the previous frame twice, both in frequency and time domain. My point is I want to decode everything manually, preserving all the results of middle steps, preferably in struct array format. I want to know all properties of the frame. Not only those in the header. I've already wrote some code, I'm able to play MP3 files with Python.
This is standard first frame lame creates to hide metadata. This frame almost empty, lame creates Info/Xing block in unused space (not used for bit reservoir too) so other applications can find it. Here where lame creates it https://github.com/gypified/libmp3lame/blob/master/libmp3lame/VbrTag.c#L900 VBRTag1/VBRTag0 - Info/Xing metadata marker.
About Huffman decode you may want to check: https://github.com/lieff/minimp3/blob/master/minimp3.h#L773 And about frame side data: https://github.com/lieff/minimp3/blob/master/minimp3.h#L470 Here where Huffman table selectors readed: https://github.com/lieff/minimp3/blob/master/minimp3.h#L576 And block type: https://github.com/lieff/minimp3/blob/master/minimp3.h#L538
PS: If you interested in python implementation take a look at https://bitbucket.org/portalfire/pymp3
I was more interested in https://github.com/lieff/minimp3/blob/e9df0760e94044caded36a55d70ab4152134adc5/minimp3.h#L766. than in https://github.com/lieff/minimp3/blob/master/minimp3.h#L773. What is ireg, where is its origin? Well, at L757 it's set to 0. But what does it mean - MP3 always starts from table 0? Your code seems too obfuscated for me. You name the variables inconsistently making it impossible to type a var name and find all occurencies. Also if you create a struct you might have been passed pointer to that struct but you are passing the fields what results in a complete mess. I'm not going to re-implement all that stuff. Maybe I'll edit your code to make it more readable. Currently I have one more question. Imagine I don't have a MP3 file, but I wrote an utility that generates data that looks like a frame. Synthesizing a frame, I mean. Can I decode that frame alone, or do I need the previous frame? When I use the short mode, do I need previous granules to decode the next granule, or I can decode the granule alone? L3_read_side_info seems useful for me. How do I call it on the frame?
typedef struct
{
int frame_bytes, channels, hz, layer, bitrate_kbps;
} mp3dec_frame_info_t;
typedef struct
{
float mdct_overlap[2][9*32], qmf_state[15*2*32];
int reserv, free_format_bytes;
unsigned char header[4], reserv_buf[511];
} mp3dec_t;
typedef struct
{
const uint8_t *buf;
int pos, limit;
} bs_t;
typedef struct
{
float scf[3*64];
uint8_t total_bands, stereo_bands, bitalloc[64], scfcod[64];
} L12_scale_info;
typedef struct
{
uint8_t tab_offset, code_tab_width, band_count;
} L12_subband_alloc_t;
typedef struct
{
const uint8_t *sfbtab;
uint16_t part_23_length, big_values, scalefac_compress;
uint8_t global_gain, block_type, mixed_block_flag, n_long_sfb, n_short_sfb;
uint8_t table_select[3], region_count[3], subblock_gain[3];
uint8_t preflag, scalefac_scale, count1_table, scfsi;
} L3_gr_info_t;
typedef struct
{
bs_t bs;
uint8_t maindata[MAX_BITRESERVOIR_BYTES + MAX_L3_FRAME_PAYLOAD_BYTES];
L3_gr_info_t gr_info[4];
float grbuf[2][576], scf[40], syn[18 + 15][2*32];
uint8_t ist_pos[2][39];
} mp3dec_scratch_t;
Can you put comments?
ireg it's region counter, there up to 3 regions can have own Huffman table, decoding starts from gr_info->table_select[0] table (it's not necessarily zero).
Even if frame (have 1 or 2 granules) do not use bit-reservoir, first frame alone will be not in good shape because of filterbank must be feeded, 2nd frame+ will be in good enough shape.
Side info encoded right after header and CRC: https://github.com/lieff/minimp3/blob/master/minimp3.h#L1699
You are right, this code is not optimized for reading but for small size (both code and binary) and performance and I'm not going to change that.
Note that canonical mpg123 have even more optimized Huffman: https://github.com/georgi/mpg123/blob/master/src/libmpg123/layer3.c#L706 For me it's even less understandable despite it have some comments.
If you are looking for most understandable and commented code, I think it's reference code "MPEG-2 Audio Simulation Software Distribution 10".
#include <iostream>
typedef unsigned char uchar;
typedef struct mp3_header{
uchar sync1 : 8;
uchar sync2 : 3;
uchar mpegVer : 2;
uchar mpegLayer : 2;
bool notProtected : 1;
uchar _bitrate : 4;
uchar _samplerate : 2;
bool padded : 1;
bool privateBit : 1;
uchar _stereo : 2;
uchar _jStereo : 2;
bool proprietary : 1;
bool original : 1;
uchar emphasis : 2;
unsigned short getBitrate();
unsigned short getSamplerate();
bool isStereo();
};
int main() {
std::cout << sizeof(mp3_header) << std::endl; //4
return 0;
}
Why the code can't be like that? Just cast a pointer of a frame to (mp3_header) and get all the fancy stuff. Also, readability can be mixed with speed. Also I don't see a point in making headers smaller. But if you think your code is small I gonna break your illusions. Your code could* be some kilobytes smaller if you use packed structs like that. Your code can be optimized even further. In fact, I'm going to fork you.
I'm not against using bitfields, but it must be tested with at least following compilers: gcc5-8 (arm, aarch64, x86, x64, ppc, ppc64), clang6-7 (arm, aarch64, x86, x64), armcc (arm, aarch64), msvc 20013-2017 (arm, aarch64, x86, x64).
Code size must be roughly same or less, and profiling must not show slowdown.
Usually compiler do same shift
+ and
operations, but back some years ago I've experienced that some compilers do silly things with bitfields and do not use it since then. There also problem with big endian vs little endian on some architectures\compilers and you do not know in what order bytes are mapped to memory. In decoders it's usually handled in one place - get_bits() and then you may not think about endian till output generation.
There's no problems with endian if you use char. Also chars can be used for endian detect.
That's why minimp3 reads input by chars in get_bits. It's not very optimal, usually optimized decoders uses big machine word load and byte swap instruction if available for architecture. But there problem when you map structure mp3_header sizeof(mp3_header)>sizeof(char) on input stream directly, so, you must properly handle it.
Did you say "But there's a problem when you map a structure mp3_header sizeof(mp3_header)>sizeof(char) on input stream directly, so, you must properly handle that problem."?
#include <iostream>
#include <ios>
typedef unsigned char uchar;
typedef struct mp3_header{
uchar sync1 : 8;
uchar sync2 : 3;
uchar mpegVer : 2;
uchar mpegLayer : 2;
bool notProtected : 1;
uchar _bitrate : 4;
uchar _samplerate : 2;
bool padded : 1;
bool privateBit : 1;
uchar _stereo : 2;
uchar _jStereo : 2;
bool proprietary : 1;
bool original : 1;
uchar emphasis : 2;
};
int main() {
const unsigned char stream[] = { 1, 0, 0, 0 };
mp3_header *hdr = (mp3_header *)stream;
std::cout << (int)hdr->sync1 << std::endl;
return 0;
}
This construct can have problems, I've experienced:
Modern compilers should be good, but list above should be checked.
Also clang and gcc become more and more strict about UB's and to map struct at byte alignment you must use nonstandard extensions like __attribute__ ((__aligned__ (1)))
or memcpy.
Bit fields are implementation-defined. If your compiler wants, it may choose a completely different bit packing layout than what you expect. Please don't make minimp3 rely on such implementation-defined behaviour, it's bound to fail on at least some obscure platforms where the current code works just fine.
__attribute__ ((__aligned__ (1)))
is also dangerous especially when passing around pointers to data extracted from such a struct, because the extracted pointer no longer contains any alignment information, and the compiler is free to assume that the data is aligned naturally.
Yeah, I'm completely forgot that it's also implementation defined
C99 6.7.2.1-11:An implementation may allocate any addressable storage unit large enough to hold a bit- field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
RIP bit fields for minimp3 use-case then.
The mess in the code leads to bugs. https://github.com/lieff/minimp3/blob/e9df0760e94044caded36a55d70ab4152134adc5/minimp3.h#L1626 should be if (i + k + hdr_padding(mp3) + nextfb + HDR_SIZE > mp3_bytes ) break; if(!hdr_compare(mp3, mp3 + k + nextfb)) continue; EDIT: Oops. I meant if (i + fb + hdr_padding(mp3) + nextfb + HDR_SIZE > mp3_bytes ) break; Which does mean if (i + k + nextfb + HDR_SIZE > mp3_bytes ) break; if(!hdr_compare(mp3, mp3 + k + nextfb)) continue;
RIP bit fields for minimp3 use-case then.
AFAIK, the most compilers do alignment of bitfields if there's non-bitfield members.
So, if I specify uchar:8 instead of uchar, there is one big bitfield, and no alignment should be done. Presto!
About the alignment. Alignment means the structure takes more space. It doesn't change the order, the packing does.
Of course if you want to be generic and cross-platform compatible, then RIP. But that's you, not me.
There already more strict check i + 2*k < mp3_bytes
, check inside the loop i + k + nextfb + HDR_SIZE > mp3_bytes
- for corner cases and very low k numbers. So there no performance optimization in this break and it's incorrect, because next match can be valid.
100% RIP it's not because of alignment, but because of implementation defined bits location, even in uchar.
Next match? mp3_bytes seems to be filesize minus offset in bytes. So if i + k + hdr_padding(mp3) + nextfb + HDR_SIZE > mp3_bytes, it's EOF. There's nothing after EOF. No next match.
If you understand "implementation defined bits location" so well, could you please explain what deep meaning does this phrase have?
It means that the bits can be packed in any way that the compiler sees fit. Just because you tested for example three different compilers on x86 (or whatever) that behaved the same doesn't mean that on another platform, they will behave identically. In particular this part of the quote from the standard is important:
allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined
On some non-x86 platforms it can be more optimal to pack the fields in reverse order (top to bottom rather than bottom to top). The compiler is free to do so. "I tested several compilers" is not a proof for the lack of a compiler that doesn't do something different.
Yes, next match is possible, because at each k position there padding possibilities 0, 1 and 4, next match can have different padding.
An implementation may allocate any addressable storage unit large enough to hold a bit- field.
So any buffer that has equal or greater size than one bitfield can hold that bitfield. That doesn't mean anything because we have one great bitfield and it's first. C doesn't align first elements.
If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit.
That means adjacent bitfields are one big bitfield, and they keep the same order.
If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.
If? Well. I said in GCC and Clang an element is only put in the next unit if it does not specify how much bits does it take, and thus, does not belong to a bitfield. But "implementation-defined". So, if a bitfield already contains 7 bits, we can't know if the next 2 bits will be placed immediately or after 1 bit. But we can! assert(sizeof(typename)==x) is our friend.
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
Within. RIP. But not for me, because I'm not going to be cross-platform. Or, maybe not. A unit is byte. Inside a byte, bits can be shifted left and right. There's no reason why one could be faster then another. EDIT maybe there is, for signed numbers. Of course, the first is only shifted and the last, if it's not the edge, is only AND-ed. At the edge, LSBs are AND-ed and shifted, and MSBs are shifted only. If to do the opposite, the number will be multiplied by the power of 2. It does not matter how the number is actually stored. EDIT the result is defined, but the direction of the shift is not. Don't know why. So if "the order of allocation of bit-fields within a unit is implementation-defined", it's not because of speed.
How about:
struct foo {
#ifdef BIG_ENDIAN_BITFIELDS
unsigned x:5;
unsigned y:3;
#else
unsigned y:3;
unsigned x:5;
#endif
};
I was wrong with padding. But
if (i + k + nextfb + HDR_SIZE > mp3_bytes ) break;//that is EOF, the match is impossible
if(!hdr_compare(mp3, mp3 + k + nextfb)) continue;//and there, the match is possible
That is a memory error. It could seek after the allocated memory.
Also note that there no point to optimize this loop when iterations count is low. But imagine this loop goes 1000-2000 iterations (till MAX_FREE_FORMAT_FRAME_SIZE). This case is good optimization candidate. Most obvious I see to do something like that:
if (!hdr_compare(mp3, mp3 + k + nextfb) + (i + k + nextfb + HDR_SIZE > mp3_bytes))
continue;
i.e. make one branch instead of two (and make something with buffer overread), which can be measurable improvement in busy loop at least on x86/x64. But this function even not shown in complex mp3's profile. For now main problem resides in L3_imdct36/L3_dct3_9 optimization.
PS: There no overread in initial code, C standard guarantees that continue
executed if i + k + nextfb + HDR_SIZE > mp3_bytes
condition is true and !hdr_compare(mp3, mp3 + k + nextfb)
is never executed.
So, if the loop reaches EOF it simply repeats before it reaches MAX_FREE_FORMAT_FRAME_SIZE. The part "i + 2*k < mp3_bytes - HDR_SIZE" guarantees that it stops after continue. But that makes another problem. If there's break, the next statement is not executed too, so we have branching in both cases. || and && cause branching too, don't you know? | and & don't. Even if you have "one" condition, it generates multiple jumps. Jump if first is true, jump if second is true - that's assembly. Accumulating the result would have taken another register, and in the best case one command more, and all conditions would be evaluated, didn't you think about that? The only difference is after a break, https://github.com/lieff/minimp3/blob/e9df0760e94044caded36a55d70ab4152134adc5/minimp3.h#L1620 is not executed. It skips directly to the https://github.com/lieff/minimp3/blob/e9df0760e94044caded36a55d70ab4152134adc5/minimp3.h#L1633 There's no memory error but using break could save 3 branches after the EOF. The count of the branches while the loop is the same. The only thing - fusing it would make the code shorter with (almost) no affect on performance, because my version is faster only when the loop reaches EOF otherwise it's the same speed.
The rule - if only one of many is evaluated, there's branching. Otherwise there's not.
:0
//do something
cmp a b
je 0
cmp a c
jne 1
//do something
:1
Your code.
:0
cmp a c
jne 1
//do something
cmp a b
je 0
cmp a c
jne 0
//do something
:1
Yeah, something like that, there also many optimizations can be done. For example we can try to vectorize header search at k position, and skip up to 4-8bytes if header guaranteed not there. This is potentially most worth optimization for this function. For example here optimized scan for '.' and '?' for http server:
static void parse_uri(const char *uri, char **ext, char **qstr, int len)
{
*ext = 0;
*qstr = 0;
for(int c = 0; c < len;)
{
int l;
// check for '.'
uint64_t val = *(uint64_t*)(uri + c);
uint64_t val_sav = val;
if (unlikely( (((val ^ 0x002e002e002e002eull) & 0x00ff00ff00ff00ffull) - 0x0001000100010001ull) & 0xff00ff00ff00ff00ull ))
{
goto check;
}
val >>= 8;
if (unlikely( (((val ^ 0x002e002e002e002eull) & 0x00ff00ff00ff00ffull) - 0x0001000100010001ull) & 0xff00ff00ff00ff00ull ))
{
check:
for (int i = 0; i < sizeof(uint64_t); i++)
{
if ((uri + c)[i] == '.')
*ext = (char*)uri + c + i + 1; else
if ((uri + c)[i] == '?')
{
len = c + i;
goto fin;
}
}
goto cont; // no need check current qword for '?' - already checked
}
// check for '?'
val = val_sav;
if (unlikely( (((val ^ 0x003f003f003f003full) & 0x00ff00ff00ff00ffull) - 0x0001000100010001ull) & 0xff00ff00ff00ff00ull ))
{
if ((uri + c)[0] == '?')
len = c; else
if ((uri + c)[2] == '?')
len = c + 2; else
if ((uri + c)[4] == '?')
len = c + 4; else
if ((uri + c)[6] == '?')
len = c + 6;
val >>= 8;
l = len;
if (unlikely( (((val ^ 0x003f003f003f003full) & 0x00ff00ff00ff00ffull) - 0x0001000100010001ull) & 0xff00ff00ff00ff00ull ))
{
goto check2;
} else
goto fin;
}
val >>= 8;
if (unlikely( (((val ^ 0x003f003f003f003full) & 0x00ff00ff00ff00ffull) - 0x0001000100010001ull) & 0xff00ff00ff00ff00ull ))
{
check2:
if ((uri + c)[1] == '?')
l = c + 1; else
if ((uri + c)[3] == '?')
l = c + 3; else
if ((uri + c)[5] == '?')
l = c + 5; else
if ((uri + c)[7] == '?')
l = c + 7;
if (l < len)
len = l;
fin:
*qstr = (char *)uri + len;
break;
}
cont:
c += sizeof(uint64_t);
}
}
Link, please - the code for "unlikely" is missing.
There no link, this is my unpublished code. This macros for gcc/clang:
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
such macros is widely known trick to help compiler with branch decisions without Profile Guided Optimizations build.
!((val ^ 0x002e002e002e002eull) & 0x00ff00ff00ff00ffull) - isn't it faster? __builtin_expect. Cool. But then, why not unlikely(~((val ^ 0x002e002e002e002eull) & 0x00ff00ff00ff00ffull) ) ?
Because most string without '.' and '?', so we scan and skip by 8bytes which not contains them and this is main branch. For 8bytes which contains them we take unlikely() branch and take a closer look by bytes.
if val ^ 0x002e002e002e002eull) & 0x00ff00ff00ff00ffull is false then there's four dots. 0x0001000100010001ull) & 0xff00ff00ff00ff00ull is true if there's at least one dot. unlikely means it's rather false.
Yes, "at least one dot" condition expected to be zero __builtin_expect("at least one dot", 0), i.e. "at least one dot" is not expected. Nothing wrong here.
Your code
typedef struct
{
const uint8_t *buf;
int pos, limit;
} bs_t;
My code:
typedef struct {
mp3header* buf;
size_t sz;
ushort len;
//added
ushort len_hint;
bool synced;
} mp3frame;
These structs were similar in the beginning but then I've added more fields. Funny I've implemented almost the same without even looking at bs_t.
typedef struct {
mp3header* buf;
slong sz;
sshort len;
sshort len_hint;
bool synced;
} mp3frame;
That is better. My code is written in such a way a size could be negative.
if (HDR_IS_CRC(hdr))
{
get_bits(bs_frame, 16);
}
So, it does not check CRC? That does nothing because bs_frame is unchanged.
No, it ignores CRC. There several reasons for it:
Your mp3d_find_frame is 1366 bytes, my OO mp3frame_getNext is 1508.
Right. It ignores CRC, and that is obvious, but it does nothing with bs_frame. Am I correct? I mean, if I'd did that I would have modified bs_frame.
Nope, it updates bs_frame->pos field.
"if ((bs->pos += n) > bs->limit)" Never did anything like that. *dp++ is my favorite but I'm always looking in the reference before doing stuff like this. What does it mean?
This equivalent to
bs->pos += n;
if (bs->pos > bs->limit)
I do not update len alone. Pythonic way is to increment the pointer and decrement the length.
That can be useful to detect bitfield order. Credits:
https://stackoverflow.com/questions/4239993/determining-endianness-at-compile-time
#define I_AM_LITTLE (((union { uchar x : 7; uchar c : 1; }){1}).c)
EDIT: I think
#define I_AM_LITTLE (((union { uchar x; struct{ uchar y: 7; uchar c : 1; } c }){1}).c.c)
is more generic.
Of course
If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.
So, I have to be careful that no bitfields are on the edge. 8, 3+2+2+1, 4+2+1+1,2+2+1+1+2, you got the idea.
It's polymorphism.
typedef struct {
float pcm[MAX_FREE_FORMAT_FRAME_SIZE];
sshort len;
} pcm;
typedef struct {
float pcm[MAX_FREE_FORMAT_FRAME_SIZE>>1][2];
sshort len;
} stereo;
Yeah, I know that trick and use same idea detect float point structure: https://github.com/lieff/lvg/blob/master/swf/avm1.c#L100 But that's not so embedded project like minimp3, which target micro-controllers like ESP32.
When a frame following the first frame is decoded it uses the previous frame. How is the first frame decoded, and what is used instead of the previous frame?