tallbl0nde / TriPlayer

A feature-rich background audio player for Nintendo Switch (requires Atmosphere)
MIT License
176 stars 24 forks source link

Added TagLib and switched MP3.cpp to use it #16

Closed einsteinx2 closed 3 years ago

einsteinx2 commented 3 years ago

Created a custom Makefile at Application/libs/taglib.mk to allow building of untouched upstream TagLib as a Switch/libnx compatible static library as their CMake build config was not cooperating.

TagLib supports basically every audio file format we could possibly want to support, and also provides easy access to a bunch of additional metadata that TriPlayer currently isn't using, but may be interested in using in the future (bitrate, custom tags, etc).

I've tested this extensively on my Switch, so it should be ready to merge, but please confirm I didn't make any stupid C++ mistakes as I don't do much coding in C++ and may have missed something obvious.

tallbl0nde commented 3 years ago

Thanks for your work! I'll have a proper look over it and test it myself later when I'm not busy. :)

Quick question, did you test if TagLib returns the correct length of a VBR (variable bitrate) MP3 without any length metadata? I know that's specific but in that scenario you have to scan over the entire file to calculate the total length (which I was doing with my original code). I did a quick Google search and couldn't find anything on it.

einsteinx2 commented 3 years ago

All of my test files were VBR and they displayed the duration correctly, but I just double checked and I see in Application/libs/taglib/taglib/mpeg/mpegproperties.cpp in the lengthInSeconds() method that it does in fact only read it from the header metadata and not by scanning the whole file as you were doing.

I've never personally come across any MP3's in my collection with this issue, but it sounds like a valid concern.

The header data is read in the MPEG::Properties::read method in mpegproperties.cpp in TagLib. Here's the implementation:

if(d->xingHeader && firstHeader.samplesPerFrame() > 0 && firstHeader.sampleRate() > 0) {

    // Read the length and the bitrate from the VBR header.

    const double timePerFrame = firstHeader.samplesPerFrame() * 1000.0 / firstHeader.sampleRate();
    const double length = timePerFrame * d->xingHeader->totalFrames();

    d->length  = static_cast<int>(length + 0.5);
    d->bitrate = static_cast<int>(d->xingHeader->totalSize() * 8.0 / length + 0.5);
  }
  else if(firstHeader.bitrate() > 0) {

    // Since there was no valid VBR header found, we hope that we're in a constant
    // bitrate file.

    // TODO: Make this more robust with audio property detection for VBR without a
    // Xing header.

    d->bitrate = firstHeader.bitrate();

    // Look for the last MPEG audio frame to calculate the stream length.

    const long lastFrameOffset = file->lastFrameOffset();
    if(lastFrameOffset < 0) {
      debug("MPEG::Properties::read() -- Could not find an MPEG frame in the stream.");
      return;
    }

    const Header lastHeader(file, lastFrameOffset, false);
    const long streamLength = lastFrameOffset - firstFrameOffset + lastHeader.frameLength();
    if(streamLength > 0)
      d->length = static_cast<int>(streamLength * 8.0 / d->bitrate + 0.5);
  }

Based on their comments, it looks like it will miscalculate the duration if the VBR header is missing as it will just take the bitrate of the first MPEG frame and use that to calculate the duration. That's unfortunate because then it means there's no way to know whether it is actually a CBR file or it's a VBR file missing the Xing header, so we can't fall back to your calculation method if it fails as we won't know it failed...

If this is something you're concerned about happening in practice, I can go ahead and put back your duration calculation function and use that in place of theirs. Otherwise, we can stick with their implementation as it should be significantly faster since it doesn't need to scan the whole file. Let me know what you prefer.

Oh also I noticed that you insert a space before your pointers whereas I have a habit of placing them directly after the type without a space, so I changed the code to match your coding style. If you want me to put your duration calculation function back in, I'll include that in that change, otherwise I'll include it with the FLAC changes and will use that style going forward.

tallbl0nde commented 3 years ago

I'd say we should rely on TagLib completely and if users mention inaccurate lengths we may have to look at using my code again. It would be great if TagLib simply said "this length is a guess" so it could then fallback to my code. I guess if it is an issue we'd just need to check for the Xing header manually and if it's not found use my code.

Regarding the pointers, I don't mind if you omit the space since that's what you're used to, the only code style thing I ask is that opening braces aren't on new lines (which I don't think you've done anyway).

einsteinx2 commented 3 years ago

Sounds good. I found a small bug (I had forgotten to set m.ID = -1 when successfully reading ID3v1 tags), so I pushed that along with the spacing around the pointers. I don't mind adding them, though I may forget sometimes. And yeah no worries on the braces, I always do end of line anyway.

Btw I've almost got the FLAC support done. All the library scanning is done including reading tags and art using TagLib, but getting crashes trying to play the files, so need to figure that out. Either it's a bug in the original code from vrilly or I messed something up when I was moving their code into my branch, not sure which yet haha. Also, I refactored the TagLib stuff a bit so that it's more reusable across different file types (various file types share tag formats, so I moved some logic into shared methods).

I'll hold off on a PR until I get playback working, though maybe it makes more sense to just push to this branch if you haven't merged it yet, since all the TagLib stuff is a bit different now due to the refactoring. Anyway we'll see once I get it working.

tallbl0nde commented 3 years ago

I might hold off until you've got FLAC support up and running then, there's no rush to merge this anyway.

You might find this page useful: https://switchbrew.org/wiki/Error_codes It lists all the codes the switch will report when it crashes.

You can also run aarch64-linux-gnu-addr2line -e <path to elf> -f -p -C -a <address> and fill in the path to the .elf and address the program counter reports the crash has occurred at to narrow down what's causing the crash.

vrilly commented 3 years ago

I think i messed up the decoding part; didn't test it like i said myself xD

einsteinx2 commented 3 years ago

@tallbl0nde Thanks that info will be super helpful! Sometimes I was getting full OS crashes including the full register dump and stack trace info, but given that it's all just raw addresses, I wasn't sure how to actually use the info. Unfortunately, more often than not, it just completely locks up the whole OS and I have to force shutdown. Presumably because the audio decoding code is inside the sysmodule rather than the app?

Is there any way to easily test the code in the Application context instead to prevent crashing the whole OS? I suppose I could throw together a separate test app to test the decoding logic, but I'd also need to implement the other code related to actual playback to properly test it, and not sure if it's worth the effort (though it could be useful for testing other codecs in the future). Actually, maybe what I'll do is create a small test app on PC to try out the decoding logic as it should be the same on PC and Switch (both are little-endian and should work the same way as far as pulling out the left and right channels from the frames in the correct byte order I think).

What's making the whole thing more difficult is that since it's crashing the whole OS, it isn't writing logs to the sysmodule log file or it just writes a corrupted last line presumably due to the force shutdown and the module freezing before it can write any logs, so it's making it a lot harder to debug. Plus each test takes forever since I have to load up the new sysmodule, boot up the system, try and play, crash/hang, force shutdown, try something else, repeat haha.

@vrilly no worries, I think the code is really close to working, just probably something wrong with the way it's pulling out the bytes for decoding, however I'm surprised it's causing crashes rather than just playing garbage data. That makes me think maybe it's overflowing the audio buffer it's writing to or something like that, which I'm not sure I can properly test on PC since whatever audio playback code I would use (in this case, Apple's CoreAudio) may not request data in the same way... Anyway going to try out the FLAC++ example code on my Mac now and confirm that works, then see if I can use that example to see what we're doing wrong.

einsteinx2 commented 3 years ago

I might hold off until you've got FLAC support up and running then, there's no rush to merge this anyway.

Yeah I agree. This original MP3-only PR just swaps out the tag reading logic, so it doesn't really add anything new feature-wise, just sets us up to more easily support other formats without writing custom metadata parsing code. Going to try and fix the FLAC playback right now, and if I can get that working, I'll clean up the code and push to this branch to update this PR.

einsteinx2 commented 3 years ago

@tallbl0nde I made some small changes to the decoding logic based on the FLAC++ example code (they show how to bit shift the data to correctly place it into a little-endian byte buffer) and now with that change as well as another small change I made to the decode function's loop, I'm not getting crashes anymore which is good, and it's calling the FLAC decoding logic, but no audio plays. I looked at the log, and I see it's failing to create a new voice for some reason. Presumably "voice" in this context means the Switch audio output? So possibly it's decoding correctly now, but just not playing the audio because it can't initialize it?

Here's a snippet of the log (note that there are many more "write_callback called" log statements after this for about 15 seconds then it automatically switches to the next song for some reason):

    1 [00:01:51] [E] SourceFactory makeSource returning FLACSource 
    2 [00:01:51] [E] [FLAC] Opening file: /music/Amerigo Gazaway - Bizarre Tribe - A Quest to The Pharcyde/Gummy Soul - Bizarre Tribe- A Quest to The Pharcyde - 07 4 Better Or 4 Worse.flac
    3 [00:01:51] [E] [FLAC] File opened successfully
    4 [00:01:51] [E] [AUDIO] Failed to init a new voice!
    5 [00:01:51] [E] [TEST] FLACSource::decode called
    6 [00:01:51] [E] [TEST] FLACSource::write_callback called
    7 [00:01:51] [E] [TEST] FLACSource::write_callback called
    8 [00:01:51] [E] [TEST] FLACSource::write_callback called
    9 [00:01:51] [E] [TEST] FLACSource::write_callback called
   10 [00:01:51] [E] [TEST] FLACSource::write_callback called
   11 [00:01:51] [E] [TEST] FLACSource::write_callback called
   12 [00:01:51] [E] [TEST] FLACSource::write_callback called
   .........
   3131 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3132 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3133 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3134 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3135 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3136 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3137 [00:02:07] [E] SourceFactory makeSource returning FLACSource
   3138 [00:02:07] [E] [FLAC] Opening file: /music/Souls of Mischief - 93 Still (Gummy Soul Remix)/Souls of Mischief - 93 Still (Gummy Soul Remix) - 07 93 Still.flac
   3139 [00:02:07] [E] [FLAC] File opened successfully
   3140 [00:02:07] [E] [AUDIO] Failed to init a new voice!
   3141 [00:02:07] [E] [TEST] FLACSource::decode called
   3142 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3143 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3144 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3145 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3146 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3147 [00:02:07] [E] [TEST] FLACSource::write_callback called
   3148 [00:02:08] [E] [TEST] FLACSource::write_callback called
   3149 [00:02:08] [E] [TEST] FLACSource::write_callback called
   3150 [00:02:08] [E] [TEST] FLACSource::write_callback called
   3151 [00:02:08] [E] [TEST] FLACSource::write_callback called
   ....
   continues the same as above, switching songs every 15 seconds or so...

Any idea why it would be failing to create the voice? I figured that logic would be the same as when MP3 decoding, so not sure why that's happening (actually I should double check with an mp3 and see if it also logs that error to confirm, but presumably it doesn't since MP3s play fine).

EDIT: I just tested with an MP3 and it doesn't log the "[AUDIO] Failed to init a new voice!" error... So it seems to only be doing it when playing FLAC files, which is strange. I need to trace that code back and see why it would fail since at that point it hasn't even started decoding yet, so I would think it wouldn't matter which decoder is used, unless there's something special implemented in the MP3 source class that is missing in the FLAC source class?

EDIT2: Looks like I found it:

   4 [00:04:27] [E] [TEST] Audio::newSong rate: 0 channels 0
   5 [00:04:27] [E] [AUDIO] Failed to init a new voice!

I think the code is only getting that data from the MP3 library. I think if I can fix that so that it gets the correct values for FLAC files, it should work now. I think I fixed the decoding logic. Attempting to fix it now.

einsteinx2 commented 3 years ago

Fixed the new voice initialization error! Looks like potentially some kind of bug in libFLAC++ or a misunderstanding of how this is supposed to work, but basically if I implement the metadata callback, it fires with the correct channel/sample rate/etc metadata, but right after, when calling the libFLAC++ decoder class's get_sample_rate() and get_channels() functions, they return 0... So I ended up just storing the values in the metadata callback since it's synchronous and always fires before we return the object, and then read them from the decoder.

   1 [00:00:31] [E] SourceFactory makeSource returning FLACSource 
   2 [00:00:31] [E] [FLAC] Opening file: /music/Souls of Mischief - 93 Still (Gummy Soul Remix)/Souls of Mischief - 93 Still (Gummy Soul Remix) - 02 Live And Let Live.flac
   3 [00:00:31] [E] [TEST] metadata_callback called
   4 [00:00:31] [E] [TEST] metadata_callback sample_rate: 44100 channels: 2 total_samples: 13752522
   5 [00:00:31] [E] [TEST] sampleRate: 44100 channels: 2 totalSamples: 13752522
   6 [00:00:31] [E] [FLAC] File opened successfully
   7 [00:00:31] [E] [TEST] Audio::newSong rate: 44100 channels 2
   8 [00:00:31] [E] [TEST] FLACSource::decode called
   9 [00:00:31] [E] [TEST] FLACSource::write_callback called
  10 [00:00:31] [E] [TEST] FLACSource::write_callback called
   .....

So now it's correctly starting audio output, correctly reading the file information, and starting decoding...but not playing anything haha. So very close now I think, probably my updated decoding logic is still not correct, but I think that's the final piece of the puzzle to get this working.

EDIT: Hmm also the decoder class's blockSize() method is returning 0... Not sure why this wrapper methods are all failing to work correctly... All that libFLAC++ class does is wrap the C functions so we don't have to do it manually. I think I'm missing something obvious here about how those are supposed to actually work, I think they don't get set until actual audio data starts getting decoded, which makes them basically useless for our purposes since we need them to start decoding... It looks like we can get min and max block size from the metadata callback, but not sure if those values will be useable for this... I quickly tried hard coding the block size from what I saw in the decode callback logs, but that didn't work. Maybe I was using a size in samples vs bytes or vice versa or something. I need to read up some more on the FLAC spec and example code to better understand this stuff. We may want to just skip using libFLAC++ altogether and just wrap the C functions directly so we have more control. All their C++ class mostly does is just directly call those C functions anyway, but it hides some of the details that could potentially make debugging easier. Anyway, I'm going to take a break to work on some stuff for my visa renewal, but I'll keep looking into this later, probably tomorrow. Definitely very close now...

EDIT 2: So I confirmed the same behavior in their test code. Reading only the metadata doesn't make that information available, but I did see that both min and max block size are available in the metadata callback, and at least for my test file, they are equal. For the purpose we need that value (to ensure we don't overflow the buffer), I think we can just use the max_block_size value then. I took care of some of the visa stuff, so I'm gonna work on this more now. I can't stop til I get this working when I know it's so close haha! Now that I can confirm the behavior is the same in the test PC app, I'm going to modify that so that it works more like the source class in TriPlayer (decoding bit by bit instead of just all at once) and will try to get everything working there since it's much faster to iterate, then I'll make the changes to the app and hopefully will have it working by tonight. My next update here should be working code :)

einsteinx2 commented 3 years ago

And we have playback!! 🥳

Right now it only supports 16bit stereo FLACs, seeking (or maybe the tell() function) is a bit wonky (though it does work in the sense that I can seek through the song and it changes the position, it just isn't accurate, probably doing some math wrong when converting between bytes and samples as libFLAC works a little different than the mp3 library), and it crashes when the song ends, but it works! haha

Right now going to try and fix the crash and the weird seeking, then I'll add support for mono files and bit depths other than 16 (24bit is common enough to warrant supporting it, and might as well make the solution generic and I suppose some older music could be in mono).

But yeah, totally works! I added a bunch of comments as I was working to help remember things, so it should be easy for others to work on this code later if needed.

I'll let you know when I get it working completely, then I'll clean it up and push the changes for testing. I'll probably push what I have once I get the standard 16bit stereo files working completely as those are by far the most common so you can start testing sooner, then I'll work on the other bit depths and mono. Then on to the other formats!

einsteinx2 commented 3 years ago

You can also run aarch64-linux-gnu-addr2line -e <path to elf> -f -p -C -a <address> and fill in the path to the .elf and address the program counter reports the crash has occurred at to narrow down what's causing the crash.

Note to self (and anyone else stumbling across this): You must subtract the backtrace address from the backtrace start address or it won't work. For example if the start address is 0x0000000065E00000 and the BT[02] address is 0x0000000065E0B0F0, the address passed to addr2line must be 0x000000000000B0F0 or simply B0F0 as it will automatically 0-pad the addresses. Then it works great, super helpful!

tallbl0nde commented 3 years ago

Sounds great! I'm looking forward to seeing your work :)

You must subtract the backtrace address from the backtrace start address

Sorry I completely forgot to mention that, I'm used to automatically doing it!

einsteinx2 commented 3 years ago

@tallbl0nde I'm still working on the crash now, but if you want to try it out, I pushed what I have to another branch here: https://github.com/einsteinx2/TriPlayer/tree/feature/add-flac

Once I get the issues I mentioned sorted out and clean things up a bit, I'll merge everything back into this branch and push to this PR.

EDIT: So it turns out the crash only happens if you seek within the song before it ends. Without seeking it works fine. So clearly I'm doing something wrong with seeking...

EDIT 2: Ok I'm pretty sure now that the tell() implementation I wrote is incorrect. TriPlayer expects a number of samples rather than a byte position, and libFLAC only returns the byte position. I was then using the "bits per sample" to calculate the number of samples based on the byte position, but I realize now that the byte position is referring to the compressed position, while I was calculating the uncompressed position, which will always report a position further in the song than it really is. Still not sure why that would cause a crash on deallocation of the stream, but it's definitely causing the time display inaccuracy. The song still finishes when it should since it sets the done_ variable when end of file is reached, but then crashes deallocating the decoder... Doesn't make sense to me why that would happen and I can't reproduce it in a PC test app when I add seeking. For now I'm working on correcting the tell() implementation now that I know what I'm doing wrong, so we'll see if the crash still happens after that. It's still weird to be though that it is libFLAC that's crashing, not TriPlayer, so I don't see why a poor tell() implementation would cause that. I wish I could see exactly which line it's crashing on...if it still happens after fixing tell() I may need to temporarily add the libFLAC source code directly into the project just to get proper output from addr2line, unless there's some way to do it without doing that (maybe just recompiling libFLAC differently than portlibs does it? Maybe they're compiling with optimizations that cause the line number to not show, not sure).

EDIT 3: Fixed the tell() implementation, so seeking is working perfectly now, but it's still crashing :( At a bit of a loss as to what it could be at this point. Pushing my changes now if you want to try and debug it. I'll keep trying for a bit then gonna call it a night.

EDIT 4: Yeah gotta give up for today. I just can't see any reason that it would be crashing only after seeking and work fine otherwise. I thought maybe it was some kind of thread safety thing, but the dealloc happens in the same place whether you seek or not, so I if it were that, I would assume it would just always crash. Tomorrow I'll probably try and toss the libFLAC and libFLAC++ source code directly into the project to do a quick test so I can at least get an exact line number where it's crashing. The function it's crashing on is fairly long so it could be any number of things, and without knowing where it's crashing it's impossible to debug, especially since it doesn't seem to be in our code... I'm assuming it's trying to free a null pointer or something, but who knows. There are various calls to free in that function. I'll report back when I figure it out. Hopefully with the exact line number, it'll be a simple fix.

einsteinx2 commented 3 years ago

Ok so here's the progress I've made so far. Based on the addr2line output, the libFLAC function FLAC__stream_decoder_finish is what was crashing. It's a long function, so without a line number it was impossible to figure out why. It's in this file if you want to see the implementation (basically the whole libFLAC++ Decoder class just wraps these C functions): https://github.com/xiph/flac/blob/master/src/libFLAC/stream_decoder.c#L628

I had wanted to just drop the libFLAC source code into the project, but there's a ton of files and it has some custom Makefiles, so that would be a pain. Instead I did a bit of a brute force approach. I don't know how to see printf output, so what I did was split the function up into a bunch of smaller functions and chain them together so that I could see which bit of code was actually crashing.

I had narrowed it down to this bit of code, where it deallocs the output channel:

unsigned i;
for(i = 0; i < FLAC__MAX_CHANNELS; i++) {
    /* WATCHOUT:
     * FLAC__lpc_restore_signal_asm_ia32_mmx() requires that the
     * output arrays have a buffer of up to 3 zeroes in front
     * (at negative indices) for alignment purposes; we use 4
     * to keep the data well-aligned.
     */
    if(0 != decoder->private_->output[i]) {
        free(decoder->private_->output[i]-4);
        decoder->private_->output[i] = 0;
    }
    if(0 != decoder->private_->residual_unaligned[i]) {
        free(decoder->private_->residual_unaligned[i]);
        decoder->private_->residual_unaligned[i] = decoder->private_->residual[i] = 0;
    }
}

Since I know there are only 2 channels, I broke that out into two functions instead of a loop, the first function deallocates channel 0 and the second deallocates channel 1, like this:

FLAC_API void FLAC__stream_decoder_finish_dealloc_channel1(FLAC__StreamDecoder *decoder) {
    if(0 != decoder->private_->output[1]) {
        free(decoder->private_->output[1]-4);
        decoder->private_->output[1] = 0;
    }
    if(0 != decoder->private_->residual_unaligned[1]) {
        free(decoder->private_->residual_unaligned[1]);
        decoder->private_->residual_unaligned[1] = decoder->private_->residual[1] = 0;
    }
}

FLAC_API void FLAC__stream_decoder_finish_dealloc_channel0(FLAC__StreamDecoder *decoder) {
    if(0 != decoder->private_->output[0]) {
        free(decoder->private_->output[0]-4);
        decoder->private_->output[0] = 0;
    }
    if(0 != decoder->private_->residual_unaligned[0]) {
        free(decoder->private_->residual_unaligned[0]);
        decoder->private_->residual_unaligned[0] = decoder->private_->residual[0] = 0;
    }
}

It turns out it's crashing on FLAC__stream_decoder_finish_dealloc_channel0. So it must be either free(decoder->private_->output[0]-4); or free(decoder->private_->residual_unaligned[0]);

The interesting thing is that once I started splitting it up into functions, which would slightly change the timing, it actually became harder to reproduce the crash. With the pre-compiled libFLAC from portlibs, it crashes 100% of the time if you seek within the track then let it finish. With the unmodified library that I custom built first before separating it into smaller functions, it also crashed 100% of the time. Once I split it up, it would only crash like 50% of the time... That makes me think there is some kind of race condition going on, as the minor timing difference of the added function calls seems to prevent it from happening sometimes.

So I still don't know why it's happening, but at least it's clear exactly where it's happening... Clearly something is causing either decoder->private_->output or decoder->private_->residual_unaligned (or possibly even decoder->private_ or decoder itself) to get corrupted or maybe freed but not set to 0, so it tries to free it again or something like that. I'm going to split that function one more time to see exactly which one of those two it is, then I'm going to dig deeper to see why this could be happening.

The strangest part to me is that it only happens after seeking. I'm not sure why seeking would change the memory in such a way to cause the crash, it just doesn't make any sense, but here we are...

Looking into it more now, but I wanted to brain dump everything I found while it was still fresh.

tallbl0nde commented 3 years ago

That's interesting... maybe there's actually some issue with my code then if it does turn out to be a race condition. I'd love to help try and fix this but I'm still tied up with my assessments unfortunately :/

Have you run through what is changed when seeking? Also maybe try bumping up the heap size (one of the first lines in main.cpp) on the off chance it's running out of memory. I doubt it, but I once had the sysmodule crash when spawning a thread due to not enough memory.

einsteinx2 commented 3 years ago

That's interesting... maybe there's actually some issue with my code then if it does turn out to be a race condition. I'd love to help try and fix this but I'm still tied up with my assessments unfortunately :/

No worries, I know you're busy right now. But yeah given that I can't reproduce the crash in a test app on PC, and it seems to be some kind of memory corruption or race condition, it's very likely there's something going on in the TriPlayer code causing it.

In fact, one single time out of a bunch of tests, I got a crash on a different line. It crashed writing to an std::atomic specifically in the destructor of the mutex, and it happened also when the song finished just like this. So I bet it's the same memory corruption issue, just manifested differently. Here's the decoded backtrace from that crash:

0x00000000000040ac: std::unique_lock<std::shared_mutex>::~unique_lock() at /opt/devkitpro/devkitA64/aarch64-none-elf/include/c++/10.2.0/bits/unique_lock.h:100
0x00000000000007b4: main at /Users/bbaron/homebrew/switch/TriPlayer/Sysmodule/source//main.cpp:91

Which looking at main.cpp:91, it's this call: Audio::getInstance()->exit(); which is just setting Audio::exit to true: `this->exit = true;whereexitis defined as:std::atomic exit;`.

All the other times have been during the deallocation of the FLAC decoder though, but always only after seeking.

I split up that last function as I mentioned, and it was crashing on free(decoder->private_->output[1]-4);, so something is stomping on that memory (and apparently sometimes other memory) for some reason.

Have you run through what is changed when seeking?

The seeking code inside the FLAC library is pretty long and complicated since it has to decode frames to find the exact sample that you pass to the seek function, so it's not as simple as just like an fseek or something. However, in the TriPlayer code, nothing much is happening...and if we assume the bug is in TriPlayer somewhere and not libFLAC, then I don't think it would be anything the libFLAC seek function is doing specifically causing the issue...

The seek implementation in FLACSource is literally just a call to libFLAC, so I'm not sure how that's causing the problem:

void FLACSource::seek(size_t pos) {
    this->decoder->seek_absolute(pos);
}

Also maybe try bumping up the heap size (one of the first lines in main.cpp) on the off chance it's running out of memory. I doubt it, but I once had the sysmodule crash when spawning a thread due to not enough memory.

Hmm that's an interesting idea, I'm going to try that right now...

EDIT: Looks like that's not it unfortunately. I upped the heap size to 3MB and it still happened first try. Never happens with MP3 files though, which is super weird, though that doesn't necessarily mean it's an issue in libFLAC, it could just be some condition that somehow isn't triggered by the MP3 decoding library...

einsteinx2 commented 3 years ago

Also on the topic of a potential threading issue, the libFLAC documentation (https://xiph.org/flac/api/index.html) says "libFLAC does not use global variables and should be thread-safe" and we always allocate and deallocate the Source instances on the same playback thread in MainService, so it isn't clear how threading could even be an issue anyway if it's only used on the same thread. Also I see that all of the calls to seek and decode data/process the decoded audio buffers also happen from that same thread... I'm really at a loss at this point. There doesn't seem to be anything wrong in the TriPlayer seeking code either, it looks really simple actually as it just reads the seekTo value and calls seek() in the Source basically. I really have no idea where to go from here...

Clearly there's something going on, but who knows what it could be... I'm almost at the point where I want to just try a different FLAC decoding library, though libFLAC is the reference implementation and should be solid...

I'm heading out for a short trip tomorrow morning until Sunday, so I won't be available for a few days. I need to get back to working on my app for a bit haha, but sometime next week I can start looking into this again. Worst case if I can't make any further progress, we can wait until you finish with your classes and do a deeper dive into the issue together.

einsteinx2 commented 3 years ago

Oh also, turns out I'm a total idiot lol. The reason that addr2line wasn't printing a line number was because libFLAC wasn't compiled with the -g flag 🤦

All that time splitting the function up to find the line and all I needed was a single compiler flag...and on top of that, I wasn't even right about the line that crashed!

Now that I can get the line number, it turns out it was this line after the call to free:

if(0 != decoder->private_->output[0]) {
    free(decoder->private_->output[0]-4);
    decoder->private_->output[0] = 0;   <--- crashes here 
}

Not sure why the call to free would succeed but zeroing out the memory would crash, but yeah...

I realize it's using a slightly different pointer address due to some alignment shenanigans that they do, but I didn't change any of their actual logic, so presumably that code is correct (and of course works fine as long as you don't seek).

vrilly commented 3 years ago

I really don't see how that even can crash..., maybe there's already memory corruption going on before entering that part of code in libFLAC, can we use sanitizers on the switch?

einsteinx2 commented 3 years ago

@vrilly yeah I agree, I think there’s memory corruption going on somewhere that just happens to only crash once trying to access that particular bit of memory (and even one time just destructing a mutex inside TriPlayer’s code).

I’m not the most experienced dealing with C++ issues and without being able to use tools like valgrind, I’m kind of stuck on how to determine the root cause.

Interestingly, it’s impossible to reproduce this when playing MP3 files, but no idea why the FLAC decoder would trigger the corruption vs the MP3 decoder (especially since it only happens after seeking, which extra strange), though I have to assume it’s something on the TriPlayer side even if MP3 decoding isn’t triggering it.

tallbl0nde commented 3 years ago

I might have a bit of a play around with your code tonight then and see if I can at least find what block of code is causing the corruption. I'm just going to comment out as much as possible and uncomment it until it starts crashing.

I also tried some other sanitizer where you pass a flag to the linker (I think it was something like --sanitize-address) and I was met with some errors so yep, I don't think there's any tools... this is gonna be fun 🙃

einsteinx2 commented 3 years ago

Haha yep sounds about right. If you don’t get a chance to do it, I’ll try the same this week. Luckily it’s in the sysmodule which has significantly less code and than the application, so hopefully it’ll be relatively easy to narrow it down...

One other approach I’m interested in investigating is Twili (https://github.com/misson20000/twili). I haven’t used it yet, but supposedly it provides full GDB remote debugging including breakpoints and stepping through code and examining memory. Also it supposedly allows you to remotely launch new builds of not only applications but also sysmodules which would be amazing for reducing the change, build, run, test loop time. In fact, I’m really interested in trying it out even if we end up resolving this issue already. Apparently they have a small library called libTwili (https://github.com/misson20000/twili-libnx) that you can link to that allows you to use all that functionality with libnx based projects like TriPlayer (I guess it was originally written for use with a different Switch homebrew framework or something).

Based on an unresolved issue in the repo, it apparently doesn’t work on 10.2.0 firmware which I’m using now, so I’ll need to setup another SD card and new emuMMC on something like 8.0.0 which is listed as explicitly supported, but it would be amazing to have full remote binary loading and debugging support!

@tallbl0nde do you know if TriPlayer works on firmware 8.0.0? Does TriPlayer just require the latest Atmosphere or is the Horizon OS version also important? The latest Atmosphere 0.15.0 boots older firmwares (even 5.0.2), without any issues, but Twili seems to have some issues with the latest firmware versions (maybe something changed in 10.x.x idk). Having this kind of full remote debugging would be a godsend as we continue adding new features and fixing bugs. Debugging blind is such a pain in the ass haha.

tallbl0nde commented 3 years ago

Well this is interesting... I just played a FLAC and seeked to a few seconds before the end multiple times in a row and it didn't crash (using the latest commit on feature/add-flac branch). I've attached the exact FLAC I tested with to this comment, I just found it on the internet somewhere (I personally don't have any music stored in FLAC... yet :P). Maybe I just got lucky, as it definitely shouldn't crash in a mutex destructor like you said!

The only change I made to the source was changing (in Service.cpp): this->source = SourceFactory::makeSource(path); to this: this->source = SourceFactory::makeSource("/music/sample1.flac");

I'm on firmware 10.2.0 with Atmosphere 0.14.4.

do you know if TriPlayer works on firmware 8.0.0?

It should, I only remember using one libnx function where it said it required 7.0.0+.

sample1.zip

einsteinx2 commented 3 years ago

Thanks for taking the time from your studies to look into this! This is a great data point! I’ll be back home later today and will test out that sample. I wonder if somehow either that sample file somehow doesn’t trigger the issue. Or somehow hard coding the the name and thus having the same song always load somehow prevented it (though that seems unlikely).

I’ll try replicating your results exactly as you did the test, then will try out the file as part of the music collection (aka selected via the interface rather than hard coded) to see if there’s any difference.

Also just to clarify the mutex crash, that was the one and only time that happened to me, the other 100 or so times I reproduced it, it always crashed in the same libFLAC function, so it may have just been a fluke, hard to say.

I can reproduce the issue with any of the FLAC songs I happened to have tested so far, but I have a 20 second album intro song that has reproduced it about 98% of the times I’ve tested with it that I can easily share as it’s very small.

I found actually the easiest way to test is to add the album to the library, start playing a song, seek within it to any position, then tap another song in the album. Instant crash. Actually I can share the whole album when I get home, it’s a freely released remix album I downloaded from Bandcamp.

Those files are encoded by Bandcamp themselves as I downloaded the album directly in FLAC, so I’m also going to try encoding some of my own files and seeing if they reproduce the issue too. Maybe we can narrow it down to certain files and maybe figure out what is different about them. For what it’s worth, the files crashing TriPlayer for me work fine in every other application I’ve played them in, so there’s nothing inherently wrong with them.

einsteinx2 commented 3 years ago

@tallbl0nde Just did a quick test with your sample file. Rather than hard coding the path, I just put it in the /music folder and let it get scanned into the library then chose it from the Songs list. On the first try, I played it and seeked around a bit then tapped on another song and it didn't crash, but then I tapped the sample song again, seeked, tapped another song and got the crash. So it does still seem to crash on that file. Maybe something with the hard coding vs selecting from the UI somehow changed things?

Anyway here's a short 20 second intro track to a free album from Bandcamp that I've been using to repro as well. Note that like your sample song, it will sometimes not crash, but generally will always do it by the second try.

Give this one a shot and see if you can get the crash. Actually try running the stock TriPlayer build, and put both of the sample tracks into your music folder and let them get loaded into your library, then try running them from there. If you still can't repro, then that's really strange...

sample2.flac.zip

I have to do some work to try and fix an old iOS app of mine that is broken now with iOS 14, so I'm going to work on that for the next couple days, then I'll get back to trying to debug this later this week. I'm sure we can get this working! Maybe we'll even end up discovering an underlying memory corruption bug that went unnoticed due to the mp3 lib not triggering it and end up improving the overall stability of TriPlayer.

Also, if you still can't reproduce this with either of the sample files, maybe try updating your Atmosphere install to the latest 0.15.0 and see if you can repro it then. I suppose it's possible that something changed between Atmosphere versions, though it does seem unlikely. Later I can also try setting up a different SD card with Atmosphere 0.14.4 and see if I can no longer reproduce it.

einsteinx2 commented 3 years ago

Also, here's a build of libFLAC with debug symbols turned on. We wouldn't want to use it for the final release, as the debug symbols increase the size of libFLAC.a from around 500KB to 3.5MB so will increase the binary sizes (in fact we should disable debug symbols on the main applications as well for the releases to reduce binary size and ram usage--could be especially useful for the sysmodule and overlay). But for testing, this is helpful to allow you to confirm the location of the crash in the backtrace.

libFLAC with debug symbols.zip

tallbl0nde commented 3 years ago

I ended up getting the crash in the mutex destructor (line 91 of main.cpp apparently) trying my file again (still hardcoded) today. I couldn't get it to crash anymore though.

Your file crashes more consistently for me, so at least I'm able to reproduce it now! Interestingly both crashes occur in free_r, which from a quick google could mean the malloc'd blocks' "headings" are being corrupted, seeing how it seems to work fine until it's freed (i.e. when it needs to read the headers).

I'm done with uni now, however I'm feeling a bit burnt out from it all so it may be a while before I dive into this properly. We'll see!

tallbl0nde commented 3 years ago

I have no clue what was going on, I spent a few days investigating the crash and got nowhere. So instead, I thought I'd look at how sys-tune was handling FLACs. sys-tune uses a single header library for decoding, which not only doesn't crash, but also makes the actual decoding code a lot simpler compared to libFLAC++.

I've pushed this code onto my flac branch, and incorporated @vrilly's changes on the sysmodule side of things (mainly just adding the Source factory). As I don't have a proper music library to test with (in FLAC format), would you be able to give my code a quick try @einsteinx2? Your sample file as well as a few of my converted MP3s worked flawlessly.

Provided this works properly we can finally merge everything into master!

Also on a side note: it turned out the sysmodule was crashing when delete[] buf was called in Service.cpp, not the mutex destructor. I forgot to also mention that the PC must have 0x8 subtracted from it as it points to the next instruction.

tallbl0nde commented 3 years ago

I'm going to go ahead and merge this seeing as it's only incorporates TagLib. I thought it also contained some changes relating to FLAC support, my bad!