When I read the tags that have unicode text in them, in most cases there's some garbage before string termination character.
Example:
Byte sequence I get:
ff fe 32 00 30 00 31 00 35 00 31 0c f7 7f 00 00
"Corrected" byte order before I decode it from UTF-16:
fe ff 00 32 00 30 00 31 00 35 0c 31 7f f7 00 00
What I get:
2015ఱ翷
So, "fe ff" is a BOM, "00 32 00 30 00 31 00 35" is "2015". Then there are four garbage bytes "0c 31 7f f7" followed by string termination. Those garbage bytes are different every time I run the app, so looks like RAM garbage rather than something existing in the mp3 file. Also, those files' metadata displays correctly in other apps.
Correct length, if we count BOM and termination bytes should be 8 + 2 for BOM + 2 for terminations, equals 12. So all those values above are off.
Here's a bit of code I'm doing:
// I'm iterating through all the tags.
// At this point of code I'm sure I'm looking at Text frame, so I just cast it below.
juce::String value; // I'm using this type externally, don't worry too much about it
ID3v2_TextFrame* textFrame = (ID3v2_TextFrame*)frame;
const char encoding = textFrame->data->encoding;
/*
I also try frame->header->size and textFrame->size.
All three of them are different values, but the actual readable section of the text is shorter that them.
*/
const int textLengthBytes = ID3v2_strlent (textFrame->data->text);
if (encoding == ID3v2_ENCODING_ISO)
{
// ISO encoding, keeping it as is
value = juce::String (textFrame->data->text, textLengthBytes);
}
else if (encoding == ID3v2_ENCODING_UNICODE)
{
// UNICODE - this is where the trouble is
DBG ("frame->header->size: " << frame->header->size);
DBG ("textFrame->size: " << textFrame->size);
// I'm printing those bytes here to see what's under the hood
DBG ("====");
for (int i = 0; i < textLengthBytes; ++i)
DBG (static_cast<unsigned char> (textFrame->data->text[i]));
DBG ("====");
// This but makes sure bytes are in correct order and interprets them as UTF16
// I've used other means of converting, same result, so I don't think the issue is here
const auto* textAsUnicode = reinterpret_cast<juce::CharPointer_UTF16::CharType*> (textFrame->data->text);
juce::CharPointer_UTF16 textAsUnicodeCharPointer (textAsUnicode);
value = juce::String (textAsUnicodeCharPointer);
}
Also worth mentioning that I build the project using VS v143, but I've compiled your library using LLVM (clang) since it has Variable length arrays that aren't supported by VS. Either there's some incompatibility between binaries, or I'm missing something else there.
If you think my compiler setup is to blame, I can volunteer to fix #48 and replace those VLA's by malloc's, so I can build everything via VS toolset.
Update: apparently VLA's weren't the problem. Fixed both issues in my recent PR. Tested via reading a bunch of mp3 files with unicode tags. Haven't tested anything that does writing/removing though.
When I read the tags that have unicode text in them, in most cases there's some garbage before string termination character.
Example:
So,
"fe ff"
is a BOM,"00 32 00 30 00 31 00 35"
is "2015". Then there are four garbage bytes"0c 31 7f f7"
followed by string termination. Those garbage bytes are different every time I run the app, so looks like RAM garbage rather than something existing in the mp3 file. Also, those files' metadata displays correctly in other apps.Here's what different "size" values say.
Correct length, if we count BOM and termination bytes should be 8 + 2 for BOM + 2 for terminations, equals 12. So all those values above are off.
Here's a bit of code I'm doing:
Also worth mentioning that I build the project using VS v143, but I've compiled your library using LLVM (clang) since it has Variable length arrays that aren't supported by VS. Either there's some incompatibility between binaries, or I'm missing something else there.
If you think my compiler setup is to blame, I can volunteer to fix #48 and replace those VLA's by malloc's, so I can build everything via VS toolset.