wwmm / easyeffects

Limiter, compressor, convolver, equalizer and auto volume and many other plugins for PipeWire applications
GNU General Public License v3.0
6.64k stars 270 forks source link

crash error #1172

Closed ghost closed 3 years ago

ghost commented 3 years ago

need to have the easyeffects windows open because some programs are crashing it when its running as background service, i statused the pipwire's systemctl services after this happens but they dont crash

for example, first time i powered on the pc today and with easyeffects opened as service (without window) i opened audacity for the first time today, and it crashed easyeffects, after i opened audacity for second time it doesnt happen happeend me too yesterday when i was playing and voice chatting with some friends, dont know the reason why this happens sometimes but it happens

EndeavourOS 5.14.7-arch1-1 community/easyeffects 6.1.2-1 extra/pipewire 1:0.3.37-1 extra/pipewire-media-session 1:0.3.37-1 extra/pipewire-pulse 1:0.3.37-1

wwmm commented 3 years ago

Kill the current instance easyeffects -q and restart it in debug and service mode G_MESSAGES_DEBUG=easyeffects easyeffects --gapplication-service. After that start Audacity.

Audacity is an application that does some questionable things when initializing. It used to make us crash in the past. Maybe something new is happening now that PipeWire is switching sampling rate on the fly.

vchernin commented 3 years ago

I'd recommend trying Tenacity (Audacity fork) which among many other things, uses newer PortAudio which should provide better PipeWire support.

There are nightly builds, along with a nightly Flatpak build (warning: Tenacity is not officially released yet, it's still in an alpha state, but it's still interesting for testing).

ghost commented 3 years ago

ok i reproduced it easily clicking on listen microphone in gui and opening audacity,

tion icon has been hidden. (easyeffects:5409): easyeffects-DEBUG: 16:54:38.517: pipe_manager: Stream/Output/Audio 355 ALSA plug-in [audacity] was added (easyeffects:5409): easyeffects-DEBUG: 16:54:38.518: pipe_manager: ALSA plug-in [audacity] port 350 is connected to easyeffects_sink port 142 (easyeffects:5409): easyeffects-DEBUG: 16:54:38.518: pipe_manager: ALSA plug-in [audacity] port 362 is connected to easyeffects_sink port 94 (easyeffects:5409): easyeffects-DEBUG: 16:54:38.527: pipe_manager: new metadata property: 355, target.node, Spa:Id, 110 (easyeffects:5409): easyeffects-DEBUG: 16:54:38.528: pipe_manager: Stream/Input/Audio 342 Control de volumen de PulseAudio was added (easyeffects:5409): easyeffects-DEBUG: 16:54:38.530: pipe_manager: ALSA plug-in [audacity] port 350 is connected to Control de volumen de PulseAudio port 353 (easyeffects:5409): easyeffects-DEBUG: 16:54:38.530: effects_base_ui: cannot lookup application icon alsa plug-in [audacity] in /usr/local/share/pixmaps

(easyeffects:5409): easyeffects-WARNING **: 16:54:38.530: effects_base_ui: alsa plug-in [audacity] icon name not installed in the Arc-X-D icon theme in use. The application icon has been hidden. Violación de segmento (`core' generado) [rot@root ~]$

basically a core segmentation fault, dont know why

Digitalone1 commented 3 years ago

I see you have Pavucontrol open. What happens if you close it?

wwmm commented 3 years ago

I was able to reproduce it. It is not related to pavucontrol. It is Audacity not being nice as usual. I talked about it some time ago with PipeWire's developer and he explained what happens. When Audacity is initializing it probes every possible format that the sound card could ever support. And this is done in a way where they create and destroy an insane amount of objects really fast. A really bad behavior... What is happening now is that at some point while we create and destroy Audacity entries in our window we are crashing. Probably because this is happening so fast we are trying to access an object that has already been destroyed. I am looking at gdb output but so far I had no luck. No idea about which object is causing the crash.

wwmm commented 3 years ago

I did some changes that seem to have fixed this issue. But we really need to do more tests because I don't really understand why the fix is working in the first place. @Digitalone1 the problem is passing structures by reference in the argument of the Holders create method https://github.com/wwmm/easyeffects/blob/00d1a95f26b000df6618b224af4f5c69058a0cb0/src/info_holders.cpp#L27. Maybe this was the reason why I was passing by value in the past. I remember that at some point I had problems with Audacity but I do not remember how I fixed them.

It doesn't make sense to me. This part of the code should be running in the main thread. All of these calls should be serialized. I do not understand why we are having problems with structures lifetimes in this situation. My guess is that there is something really subtle in how these holders are handled by gtkmm and gtk under the hoods. It is so weird...

wwmm commented 3 years ago

And now the crash is happening again. Passing by value seems to have changed the probability of them happening but it is not the real fix yet...

Digitalone1 commented 3 years ago

I don't have the code by hand now, but if I'm not wrong, the node info holder is made by a reference of a previous copy, so that shouldn't be the issue.

I think we have a copy of a node info already deleted from the node map and maybe we're doing something with Pipewire using the node id.

wwmm commented 3 years ago

I did more changes that seem to have made us more resilient to the Audacity bullshit. I think that scheduling item->info_updated.emit(); in the main thread with signal_idle is not a good idea. When the list entries are removed as fast as Audacity is doing the item object may have already been destroyed by the time the gtk loop is idle and the scheduled calls are executed. That seems to be the real problem. But is still strange that not passing those structures by reference had influence over the probability of the crash happening. At least for now it is better to keep the copy.

Digitalone1 commented 3 years ago

Maybe the issue is not in scheduling itself inside the callback, but that the copy is made when the object is already dereferenced?

Keep the signal idle and make the copy at the start of the function, as soon as possible. I also think that adding a node already deleted (even for short time) it's not really an issue as long as we have a copy and don't make anything with Pipewire.

wwmm commented 3 years ago

Maybe the issue is not in scheduling itself inside the callback, but that the copy is made when the object is already dereferenced?

The whole situation isn't very clear in my mind yet but if the problem is really in the item object making a copy won't help. This object is a shared pointer to a gtk list object. If the list entry connected to this pointer pointer is destroyed it makes no difference how many copies of the shared pointer we have. The list entry is gone.

wwmm commented 3 years ago

It is one of those subtle situations where the pointer itself is valid. The thing is not null. I tested it. The problem is that gtk is destroying the list object this shared pointer points to. And this is handled by gtk internally. We can not copy it.

wwmm commented 3 years ago

In any case I will do some tests putting more things inside signal_idle to see how it goes. Although I am not sure if we really have an advantage in doing that in this particular place of the code.

wwmm commented 3 years ago

It seems that if everything that is inside EffectsBaseUi::on_app_changed is put inside the signal idle call the crash does not happen. Anything less than that and it crashes. The thing is I do not see a reason why we should be wrapping everything by a signal idle call in this part f the code. It is probably better not using it there.

In any case it is clear that using the shared pointers bound to gtk list objects can be very tricky...

wwmm commented 3 years ago

I also think that adding a node already deleted (even for short time) it's not really an issue as long as we have a copy and don't make anything with Pipewire.

The nodes seem to be fine. The crash was happening inside application_info_update when it was invoked by the signal holder->info_updated. For some reason when the item->info_updated.emit(); call is delayed by putting it inside signal_idle the object returned by auto holder = std::dynamic_pointer_cast<NodeInfoHolder>(list_item->get_item()); may not be bound to a valid gtk list item anymore.

Digitalone1 commented 3 years ago

The thing is I do not see a reason why we should be wrapping everything by a signal idle call in this part f the code. It is probably better not using it there.

I did it because the calls on on_app_changed are usually 20/30 times in a very short time. We don't need all those call that are heavy on CPU and I thought they may cause race condition. Anyway the scheduled_update flag inside NodeInfoHolder should reduce them.

So with the last changes the issue is resolved?

wwmm commented 3 years ago

did it because the calls on on_app_changed are usually 20/30 times in a very short time. We don't need all those call that are heavy on CPU and I thought they may cause race condition.

What is reducing the number of calls is the work you did with scheduled_update. signal_idle will just schedule the same number of calls for the next time gtk loop is idle. It is possible some performance is gained when gtk batch these operations. But the number of calls is the same. Gtk would just group them in a way that may or may not have better performance. I never tested it. I read about it in a documentation somewhere.

About the races they usually happen when multiple threads try to use the same object. Last time I checked the thread id in these operations it had the same value of the main thread. So races in the usual sense of the word do not seem possible. What I do not know is if the slot bound to emit() is called right after the emit function is called or is asynchronously invoked at a later time. In any case signal_idle won't be of much help in this scenario. Usually it is useful when we want to force things to run in the main thread, what should already be happening, or when we can schedule things for when gtk loop is idle. But letting operations for later is what seems to be causing the crash in this case.

What I think we should do is a small redesign in the info_updated signal. With the exception of the recently introduced scheduled_update what the slot application_info_update really needs is the NodeInfo object. A copy of it could be passed as the signal argument. This way we would not even have to think about what happens to the list entry object associated to holder->. The problem is what to do with scheduled_update. Maybe what we should really do to reduce the number of calls is improving the node_info callback in pipe_manager in a way that it only emits changed signals when the fields we use have changed. This way there would be no need to handle this on the ui side.

wwmm commented 3 years ago

So with the last changes the issue is resolved?

It seems to be. But I would like to do more tests.

Digitalone1 commented 3 years ago

What I think we should do is a small redesign in the info_updated signal. With the exception of the recently introduced scheduled_update what the slot application_info_update really needs is the NodeInfo object. A copy of it could be passed as the signal argument. This way we would not even have to think about what happens to the list entry object associated to holder->. The problem is what to do with scheduled_update. Maybe what we should really do to reduce the number of calls is improving the node_info callback in pipe_manager in a way that it only emits changed signals when the fields we use have changed. This way there would be no need to handle this on the ui side.

Maybe relying on the scheduled_update to reduce the emit signal on stream_changed rather than info_update.

I was also thinking to not delete the node info from the node map. Just use a flag to know if the node is alive. This should solve most of the issue because the copy of the object is still there in the node map, no need to emit copy of the whole node.

Digitalone1 commented 3 years ago

@wwmm we should also think to use some mutex or semaphores.

wwmm commented 3 years ago

we should also think to use some mutex or semaphores.

It depends on where. It is quite easy to have thread locks if they are put in the wrong place. And unless we actually have multiple threads trying to use the same object in a situation where one(or both) of them writes to this object while the other is trying to read it isn't really necessary to use them.

I was also thinking to not delete the node info from the node map. Just use a flag to know if the node is alive.

We can delete it from the map. So far we did not have any crash while trying to read the node list. Be it on the current implementation using maps or before with vectors. We are crashing when using gtk objects. Another thing to consider is memory usage if we do not remove useless nodes from the map.

no need to emit copy of the whole node.

I think it is safer and simpler to use copies in these cases. The performance gained by not doing them is debatable in our cases. But as the node map is used in different threads the danger of having data races is real. What would definitely force us to use mutex to synchronize threads. It may sound fine but have in mind that synchronizing threads has performance costs. And sometimes is a pain to get right without the risk of having the application locked. The performance gained by not doing the copy can be cancelled by the loss involved in synchronizing threads.

Digitalone1 commented 3 years ago

OK. Another thing I was thinking: is it really necessary adding nodes in on_registry_global? When a stream is added, a simultaneous call is made in on_node_info and inside the last one there are more parameters like the monitor stream to blocklist pavucontrol.

Isn't better to add the applications (not present in the node map) from within on_node_info?

wwmm commented 3 years ago

is it really necessary adding nodes in on_registry_global? Isn't better to add the applications (not present in the node map) from within on_node_info?

I have been thinking about this. It is something that we not only have to investigate but that we also should ask to Taymans. I am almost sure that I tried that approach in the beginning and while it worked for most apps there were cases where no "node changed" signal was emitted right after the audio client was created by the app. But unfortunately I can't remember with which apps that was happening. So we have to be careful.

wwmm commented 3 years ago

Since when I started to use PipeWire it went through lots of changes. It is possible that this signal emittion is guaranteed to happen now. But I think that without asking him about it we will never know for sure.

wwmm commented 3 years ago

I am almost sure that I tried that approach in the beginning and while it worked for most apps there were cases where no "node changed" signal was emitted right after the audio client was created by the app

Or maybe that was happening to devices nodes and not apps nodes. I really do not remember. But there were cases where that signal was not emitted after the node creation.

Digitalone1 commented 3 years ago

Anyway we have to pick nodes from the map at least for adding the apps, otherwise we make the same mistake I made when the app info ui is not updated when EE is started in service mode.

I'm starting to think that the issue is related to app_info_ui being updated when the NodeInfoHolder is deleted. If it's so, making copies is not saving us because it's something related to the app_info_ui widget (i.e. updating a label getting deleted).

Maybe we can try to delay the deletion, but if the app is starting and destroying repeatedly, the issue may come out elsewhere.

wwmm commented 3 years ago

If it's so, making copies is not saving us because it's something related to the app_info_ui widget (i.e. updating a label getting deleted).

When I did some tests disabling some parts of the code I saw that the crash happened while trying to use the node holder shared pointer. It did not get to the point where the labels where used. This line was enough to make it crash https://github.com/wwmm/easyeffects/blob/8885fd52eebee9b650dda628348d84bff7cb2216/src/effects_base_ui.cpp#L712. That is why reducing its usage will help.

wwmm commented 3 years ago

Remember that the holder-> shared pointer is coming from auto holder = std::dynamic_pointer_cast<NodeInfoHolder>(list_item->get_item());. The one managing the object wrapped by the shared pointer returned by list_item->get_item() is gtk. Under the hoods this object was probably destroyed. So have to use that thing as little as possible.

Digitalone1 commented 3 years ago

Ok, when I have time I'll pass it by copy and do not use holder. I delete also scheduled_update flag from it to.

Digitalone1 commented 3 years ago

The nodes seem to be fine. The crash was happening inside application_info_update when it was invoked by the signal holder->info_updated. For some reason when the item->info_updated.emit(); call is delayed by putting it inside signal_idle the object returned by auto holder = std::dynamic_pointer_cast<NodeInfoHolder>(list_item->get_item()); may not be bound to a valid gtk list item anymore.

I can easily reproduce a core dump on the latest release adding and removing from the blocklist.

I made several tests and tried to optimize the code, but the result is always the same. When the player_model is cleared, the holder is not bound to anything and accessing the app_info_ui widgets leads to a core dump.

Sometimes I get (easyeffects:20510): Gtk-CRITICAL **: 23:35:35.658: gtk_label_set_text: assertion 'GTK_IS_LABEL (self)' failed, surely on the application name label, the first widget we access in info_updated callback.

This was working when I made the MR, otherwise I never committed the changes. Something has been changed in the last days on GTK side.

wwmm commented 3 years ago

I did not test it recently. I also do not remember it crashing. I will see if gdb shows something useful.

Digitalone1 commented 3 years ago

Already done.

gdb ``` Thread 1 "easyeffects" received signal SIGSEGV, Segmentation fault. 0x00007ffff7162997 in Gtk::Label::set_text(Glib::ustring const&) () from /usr/lib/libgtkmm-4.0.so.0 (gdb) bt #0 0x00007ffff7162997 in Gtk::Label::set_text(Glib::ustring const&) () at /usr/lib/libgtkmm-4.0.so.0 #1 0x00005555556fd3c3 in EffectsBaseUi::setup_listview_players()::{lambda(std::shared_ptr const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}::operator()(NodeInfo) const () #2 0x000055555572c01b in void std::__invoke_impl const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&>(std::__invoke_other, EffectsBaseUi::setup_listview_players()::{lambda(std::shared_ptr const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&) () #3 0x00005555557256ec in std::__invoke_result const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&>::type std::__invoke const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&>(EffectsBaseUi::setup_listview_players()::{lambda(std::shared_ptr const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&) () #4 0x000055555571d455 in std::invoke_result const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&>::type std::invoke const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&>(EffectsBaseUi::setup_listview_players()::{lambda(std::shared_ptr const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}&, NodeInfo const&) () #5 0x0000555555714170 in decltype(auto) sigc::adaptor_functor const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}>::operator()(NodeInfo const&) const () #6 0x00005555557141ad in sigc::internal::slot_call const&)#2}::operator()(std::shared_ptr const&) const::{lambda(NodeInfo)#6}, void, NodeInfo>::call_it(sigc::internal::slot_rep*, NodeInfo const&) () #7 0x0000555555705bb4 in sigc::internal::signal_emit::emit(std::shared_ptr const&, NodeInfo const&) () #8 0x0000555555705c63 in sigc::signal_with_accumulator::emit(NodeInfo const&) const () #9 0x0000555555705dd3 in EffectsBaseUi::on_app_changed(NodeInfo) () #10 0x00005555558e4b0d in void std::__invoke_impl(std::__invoke_memfun_ref, void (StreamOutputEffectsUi::* const&)(NodeInfo), StreamOutputEffectsUi&, NodeInfo const&) () ```

I'm trying to find a workaround.

wwmm commented 3 years ago

I've found the problem. In the last patch I forgot to reenable the destruction of the connection_info object https://github.com/wwmm/easyeffects/blob/3ff46cdd5961e7d87796e01c44296a6e1b5c1948/src/effects_base_ui.cpp#L825. After removing the comment the crash was gone.

Digitalone1 commented 3 years ago

Anyway this should affect only when the application is in window mode. I think the issues with crashing in service mode remains.

wwmm commented 3 years ago

I think the issues with crashing in service mode remains.

What issue in service mode? So far I did not see one. At least on my computer(or the opensuse image).

Digitalone1 commented 3 years ago

What issue in service mode? So far I did not see one. At least on my computer(or the opensuse image).

With apps like Audacity spawning many streams in a short time.

wwmm commented 3 years ago

With apps like Audacity spawning many streams in a short time.

That is the issue I have been testing both on my computer and on the opensuse image. Since removing those signal_idle calls no issues opening Audacity.

wwmm commented 3 years ago

A random crash I was having when launching Tales of Arise is also gone.

AbsurdlySuspicious commented 3 years ago

I still get random crashes on beginning of playback (when I switch tracks in deadbeef, open a video in telegram, etc) with 6.1.3 from arch repos. Trace: https://gist.github.com/AbsurdlySuspicious/78eb218578628b770ea65cbdbc098a81 I understand it's nearly useless without debug symbols so I will build git version and upload it again on next crash

wwmm commented 3 years ago

14 0x00007f9e391dcac4 fftwf_plan_dft_c2r_1d (libfftw3f.so.3 + 0x122ac4)

15 0x00007f9e38ec666b _ZN9Convlevel9configureEijjjj (libzita-convolver.so.4 + 0x366b)

The crash you are seeing may have another source. Is the convolver or the crystalizer plugin being used? They use zita-convolver and this library is the one calling fftw. For reasons I do not understand sometimes fftw can crash when multiple plugins are using it. Even if each one of them has its own fftw instance through the corresponding zita-convolver instances. As it is really hard to reproduce this crash on my computer (I haven't seen it in months) I never could figure out the reason.

wwmm commented 3 years ago

I think that this issue has been fixed.