muse-sequencer / muse

MusE is a digital audio workstation with support for both Audio and MIDI
https://muse-sequencer.github.io/
Other
644 stars 69 forks source link

Frequent crashes #1105

Closed drautzburg closed 1 year ago

drautzburg commented 1 year ago

Guys, I really need your advice here.

I had a handful of recording sessions over the last two weeks, and only one of them went without any crashes. Yesterday, over the course of three hours, I had two crashes (a good day).

I believe there is more than one reason, many of which may be caused by my particular setup (compiz, firewire interface ...) or an external library and not by MusE itself. I managed to get coredumps by installing systemd-coredump and I attach stack traces here. Almost all crashes occur ...

Let me know what other info I should provide. Also, is it okay if I aggregate several crashes in this issue as I did today?

My system:

Linux beaureve 5.10.0-19-rt-amd64 #1 SMP PREEMPT_RT Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux

Config File </home/martin/.config/MusE/MusE/MusE-seq.cfg>
MusE 4.1, (master | 4.1.0-82-g29d89c3d-dirty | 2022-11-10 16:15:25 -0500) (compiled from source)

||/ Name            Version       Architecture Description
+++-===============-=============-============-======================================================
ii  jackd           5+nmu1        all          JACK Audio Connection Kit (default server package)
un  jackd-firewire  <none>        <none>       (no description available)
rc  jackd1          1:0.125.0-2   amd64        JACK Audio Connection Kit (server and example clients)
un  jackd1-firewire <none>        <none>       (no description available)
ii  jackd2          1.9.17~dfsg-1 amd64        JACK Audio Connection Kit (server and example clients)
ii  jackd2-firewire 1.9.17~dfsg-1 amd64        JACK Audio Connection Kit (FFADO backend)
spamatica commented 1 year ago

Oh, sorry to hear that.

Do you know if it has gotten worse or if it was always like this for you?

The crashes seem on the surface to be different but maybe it's possible to force one or more of them? I'm gonna try to follow the tracebacks into the code but I fear the information is too sparse when MusE is built in release-mode.

The switching project crash, what if you just repeatedly open the same project from the recent menu, is it easy to repeat crashing? Assuming we can find a way to provoke one of the crashes - without too much work - it would be interesting to play with some parameters:

It would not be out of the question that smaller buffers affect stability as it essentially means MusE works harder and does more concurrent operations. But it would be good to know if it is specifically when using jack, and jack firewire for that matter.. Skimming the backtraces it does not look related but still we might learn something.

Since you seem to have compiled MusE yourself maybe you can try to rebuild MusE with debug. if you are using the compile_muse.sh script, simply exchange -DCMAKE_BUILD_TYPE=release with -DCMAKE_BUILD_TYPE=debug

I'm not familiar with systemd-coredump, maybe it will help with handling the coredumps. Another way is to run "ulimit -c unlimited" on the commandline before starting MusE.

Then a core file will be created in the current directory (usually the directory of the currently opened project) The core file can be further examined with gdb. Explanation gets a bit long winding, if you feel like trying I can elaborate :)

drautzburg commented 1 year ago

20:41, 28. November 2022, spamatica @.***>: Oh, sorry to hear that. Do you know if it has gotten worse or if it was always like this for you?No, it used to be rock solid. But I have a new computer and a new Debian version now. Still my impression is that things went south with muse4.

The switching project crash, what if you just repeatedly open the same project from the recent menu, is it easy to repeat crashing?This one is easy to provoke. I had multiple crashes, but chose to only send you one trace. The recording crash is much harder to provoke. Assuming we can find a way to provoke one of the crashes - without too much work - it would be interesting to play with some parameters ...Will do that -DCMAKE_BUILD_TYPE=debugThis is a prerequisite for debugging with gdb, right? Or does it enable moreverbose messages too? I'm not familiar with systemd-coredump, maybe it will help with handling the coredumps.I believe I had no coredumps at all becore I installed it. Another way is to run "ulimit -c unlimited" on the commandline before starting MusE.Will do that

The core file can be further examined with gdb. Explanation gets a bit long winding, if you feel like trying I can elaborate :)Just give me the general idea. What info would you need from gdb? I believe I will be able to launch gdb with a core fileand tell it where the sources are.-- DTZ

terminator356 commented 1 year ago

Thanks for the files. I am examining all three.

For the switching projects crash, yes I am aware of some crashes. As you can see, our heartbeat timer comes along and attempts to call something on a midi strip, and the strip or its track probably does not exist because the project has been cleared: ` #0 0x00007f4fce8fb500 __dynamic_cast (libstdc++.so.6 + 0xa4500)

1 0x000055a5f707ae8f _ZN7MusEGui9MidiStrip14updateControlsEv (muse4 + 0x39ce8f)

            #2  0x000055a5f707cd4b _ZN7MusEGui9MidiStrip9heartBeatEv (muse4 + 0x39ed4b)

`

In addition, I have seen crashes with using the newer 'lambda' connections (which replace signals and slots mechanism). I learned that using them is much more tedious than one would expect. If the 'lambda' connection is from some outside source, one must keep track of the connection object itself and manually remove the connection when the object is about to be destroyed. (We are warned about that in the docs.) Otherwise the connection will still exist even though the receiving object does not, causing a crash when the connection tries to communicate. I managed to fix some of them by properly 'retro-fitting' some of the ones we use, but there are still others that need attention.

drautzburg commented 1 year ago

Thanks for the files. I am examining all three.

For the switching projects crash, yes I am aware of some crashes. As you can see, our heartbeat timer comes along and attempts to call something on a midi strip

BTW I also got a crash when I tried to open the project that previously caused a crash when switching projects. I managed to open it by passing the .med file on the commandline. But that could well be a coincidence.

Also, I sometimes get an empty project name under recent projects. I don't open this one, but I frequently find myself editing the projects file because the most recent project is not on top and Muse opens the second latest project, causing me to switch projects, causing a crash. And I also delete an occasional empty line. I filed a bug report years ago concerning the order of projects in the recent projects, but it still feels off. However, this is really a minor issue.

I managed to fix some of them by properly 'retro-fitting' some of the ones we use, but there are still others that need attention.

You seem to know what you're talking about.

I have compiled a debug version of MusE

ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=8ffa09c7ec9e6e757fce1636831aab3f3698eeaf, for GNU/Linux 3.2.0, with debug_info, not stripped

Would it make sense to provoke the recording crash by using small jackd buffers, feed the core file to gdb and send you the stack trace? Or what else can I do to help?

spamatica commented 1 year ago

Got sidetracked by influenza (: recovering now....

Also, I sometimes get an empty project name under recent projects. I don't open this one, but I frequently find myself editing the projects file because the most recent project is not on top and Muse opens the second latest project, causing me to switch projects, causing a crash. And I also delete an occasional empty line. I filed a bug report years ago concerning the order of projects in the recent projects, but it still feels off. However, this is really a minor issue.

I looked for the old report but could not find it. Not sure I understand what is happening. Do make a new report if you will.

Would it make sense to provoke the recording crash by using small jackd buffers, feed the core file to gdb and send you the stack trace? Or what else can I do to help?

Yes. Getting a backtrace with gdb could give the clue we need. Though I think in the type of problems Tim suggests, (lambda/anonymous functions and concurrency in general) it may still not give enough information. But it would be good if you could try! I assume this means you are pretty sure small buffers do increase the crashes. What about running rtaudio and small buffers, does that also cause crashes?

What I was after with gdb was that you load the core in gdb e.g: gdb core and on the prompt issue 'bt -full' to get the complete back trace.

drautzburg commented 1 year ago

Will try to send you a bt-full.

Let's not worry about the projects file for now.

I know nothing about rtaudio, can you give me some clues? Do I need to install anything (found nothing named rtaudio in the reps). Does it work with firewire or should I use an analog output?

drautzburg commented 1 year ago

Some quick side notes:

spamatica commented 1 year ago

I know nothing about rtaudio, can you give me some clues? Do I need to install anything (found nothing named rtaudio in the reps). Does it work with firewire or should I use an analog output?

Probably you will not get audio through firewire. For the testing it should not matter. I suggested it mostly to remove jack from the equation.

To give some background on RTAudio (https://www.music.mcgill.ca/~gary/rtaudio/) It is a library which allows easy connection between an application and different types of audio output. We use it mostly to connect to PulseAudio, as many distributions have this out of the box. There should be a debian package for it. librtaudio something. If muse is built with rtaudio starting 'muse4 -t' will force rtaudio with output to pulse

I see now your comment about 32 frames latency not making a huge difference, at least not crashes, so maybe changing more variables will not make it easier, might as well stick with jack for now.

spamatica commented 1 year ago

CPU was at 20%. Is there a way I can put the remaining 80% to use?

Found this link for just putting load on the system. The stress tool looks promising. https://superuser.com/questions/443406/how-can-i-produce-high-cpu-load-on-a-linux-server

drautzburg commented 1 year ago

Will try the stress tool.

But actually I was wondering if I can use the remaining 80% for MusE and jack, such that I can run very low latency without xruns.

spamatica commented 1 year ago

But actually I was wondering if I can use the remaining 80% for MusE and jack, such that I can run very low latency without xruns.

Ah, right! :) MusE could be better in utilizing multiple cores. Are you using soft synths? In which case using a softsynth with good multi-core support would allow you utilize more cpu.

Actually, this could be relevant to the problem at hand too. What does your projects generally contain, audio, midi, softsynths, effects? Any example of what what you generally use?

Here I thought I would do some long term recording tests to see if I could catch the recording issue live but I discovered another issue that held me up. But it is resolved now, so I will set up a new test-environment and startup some recording tests again.

drautzburg commented 1 year ago

I am using all of the above

I did not manage to reproduce the recording crash myself, but I haven't tried very hard yet. Thanks for trying yourself. One of the problems is that it only crashes when the musicians are in the room watching and I am doing a real recording. I hesitate to use the debug version then, because it appears much more prone to xruns.

Today I changed some realtime priority settings of interrupt handlers, jackd and muse, but the improvement was marginal. Still getting xruns with -p128 but only less than 20% CPU.

Fun fact: I asked chatGPT about latency and xruns and the answer shows some knowledge, but no real solution:

Q: How can I reduce latency when running jackd on linux without causing xruns

A: To reduce latency when running jackd on Linux, you can try the following:

Increase the size of the audio buffer. This can be done by using the -p flag when starting jackd and setting it to a higher value, such as 2048 or 4096. This will increase the amount of time that jackd has to process the audio, which can reduce the likelihood of xruns (dropouts in the audio caused by the system not being able to keep up with the real-time processing requirements).

Use a real-time kernel. By default, most Linux distributions use a kernel that is optimized for general-purpose computing, rather than real-time audio processing. Switching to a real-time kernel can help reduce latency and improve the overall performance of jackd.

Turn off any unnecessary system services and processes. Having other programs and services running on your system can compete with jackd for system resources and cause xruns. Try closing any unnecessary programs and services to free up resources for jackd.

Use a dedicated sound card. If you're using the built-in sound card on your computer, it may not have the processing power or features needed to handle low-latency audio processing. Consider using a dedicated external sound card that is specifically designed for low-latency audio processing. This can help improve the performance of jackd and reduce the likelihood of xruns.

Use a multi-core processor. Audio processing can be resource-intensive, so using a multi-core processor can help improve the performance of jackd and reduce latency. Try enabling all of the available cores on your processor to see if it helps reduce xruns.

It's also important to note that reducing latency can sometimes come at the expense of other system resources, such as CPU usage. If you're experiencing xruns despite trying the above steps, it may be because your system simply doesn't have enough resources to handle the real-time audio processing requirements of jackd. In this case, you may need to upgrade your hardware to improve the performance of jackd.

drautzburg commented 1 year ago

This is what the current project looks like. Eight more audio tracks at the bottom are not shown. Muse

terminator356 commented 1 year ago

My apologies for being mostly silent here. I am trying to finish off some work here before I can test or investigate the crashes. Nice screen shot. Sorry about the problems while recording, it must be annoying...

spamatica commented 1 year ago

Today I changed some realtime priority settings of interrupt handlers, jackd and muse, but the improvement was marginal. Still getting xruns with -p128 but only less than 20% CPU.

There are A LOT of parameters that affect performance. What hardware are you running on? I have found that setting the scheduler to performance for all cores helps a lot. cpufreq-set -c 0 -g performance and so on for all cores.

With that said I usually don't try to use such low latency when I'm recording. MusE should compensate for latency between playback and recording if it is enabled in the settings (should be by default).

-- Moreover, I think I have managed to reproduce the recording bug. There is a case where the live view of recorded audio parts deletes the audio file pointer, probably because there is a race condition with the reference counting. Haven't figured exactly how it can happen but I observed it twice now.

drautzburg commented 1 year ago

Glad to hear, you could reproduce it. If there's anything I can do, let me know.

I guess I will create feature request to improve SMP performance, if you don't mind, to keep this issue focused on the crash problem.

revolverve commented 1 year ago

drautzburg I'm curious too? What hardware are you running on?

drautzburg commented 1 year ago
onefang commented 1 year ago

I have also seen crashes every time I use "Open Recent" when a project is already loaded. If I "Close" the project first, no crashing.

terminator356 commented 1 year ago

Hiya @drautzburg sorry for the very late reply. Hope you are still with us.

A huuuuge batch of crash fixes and other stuff has been done over the last several weeks. Please have a look at the ChangeLog to see if anything rings a bell.

All three of your reported crashes should be fixed, mostly, hopefully. In particular, I observed the audio recording crashes some time ago and had been monitoring them. Very intermittent and difficult to track and fix. I feel your pain. I noted that this one fix was perhaps more important than all the rest ! Very bad to crash in the middle of recording a stellar performance.

Please let me know if any trouble.

I have not examined the wave editing features for a long time. But I am aware that there are problems there. Opening the wave in an external editor for example. Freezing etc. I'll try to take a look...

drautzburg commented 1 year ago

That sounds great. Next recording session should be in a week or two. I'll let you know how it went.00:18, 26. Mai 2023, Tim @.***>: Hiya @drautzburg sorry for the very late reply. Hope you are still with us. A huuuuge batch of crash fixes and other stuff has been done over the last several weeks. Please have a look at the ChangeLog to see if anything rings a bell. All three of your reported crashes should be fixed, mostly, hopefully. In particular, I observed the audio recording crashes some time ago and had been monitoring them. Very intermittent and difficult to track and fix. I feel your pain. I noted that this one fix was perhaps more important than all the rest ! Very bad to crash in the middle of recording a stellar performance. Please let me know if any trouble. I have not examined the wave editing features for a long time. But I am aware that there are problems there. Opening the wave in an external editor for example. Freezing etc. I'll try to take a look...

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***> -- DTZ

drautzburg commented 1 year ago

The change log is indeed quite impressing. This is the way.

luzpaz commented 1 year ago

Is this issue still relevant with latest stable 4.2 ?

drautzburg commented 1 year ago

Am 21.09.23 um 13:27 schrieb luzpaz:

Is this issue still relevant with latest stable 4.2 ?

I didn't see any crashes with 4.1 from 2023--06-11, compiled from source. I didn't stress it very much though. Haven't tried 4.2 yet.

Message ID: @.***>

spamatica commented 1 year ago

We are probably not free from crashes but as this seem to be fixed I will close it. If it pops up again it can be reopened :)

drautzburg commented 6 months ago

Had an extended recording session over the weekend with 4.2.1. Worked like a charm, no crashes, no issues at all.

terminator356 commented 6 months ago

Thank you for the report. Let's hope it lasts. As usual, more fixes are in the works, in another branch.