mikebrady / shairport-sync

AirPlay and AirPlay 2 audio player
Other
7.3k stars 574 forks source link

Issues with defuncts left by script hooks #968

Closed patrickjane closed 3 years ago

patrickjane commented 4 years ago

I am using the following config:

general =
{
  name = "AirPi";
  volume_range_db = 120;
};

alsa =
{
  output_device = "hw:0";
  mixer_control_name = "DSPVolume";
};

sessioncontrol =
{
  run_this_before_play_begins = "/home/shairport/playbackstart";
  run_this_after_play_ends =  "/home/shairport/playbackstop";
};

The two scripts which shall be run upon start/end of playback contain:

#!/usr/bin/env bash
echo "done start" >> /home/shairport/log.log
exit 0
#!/usr/bin/env bash
echo "done stop" >> /home/shairport/log.log
exit 0

When conneting to shairport & starting & stopping playback, I can see that the log is written:

shairport@homepi:~ $ tail -f log.log
done start
done stop

However, every time the scripts are run, it will leave a defunct process:

shairport@homepi:~ $ psg shairpo
shairpo+ 18043  6.0  0.4  91564  8424 ?        Ssl  12:10   0:13 /home/shairport/shairport/shairport-sync
shairpo+ 18147  0.0  0.0      0     0 ?        Z    12:10   0:00 [bash] <defunct>
shairpo+ 19895  0.0  0.0      0     0 ?        Z    12:13   0:00 [bash] <defunct>

Is this a bug or misconfiguration? I am on a raspberry pi 4 running rasbian buster. I have compiled & installed shairport-sync today from the latest git version.

mikebrady commented 4 years ago

I don’t know the reason, but it may be the exit 0 in each. Remove them and see what happens?

patrickjane commented 4 years ago

Unfortunately, this wont help. I didnt have those exit 0 in there at first, but when I saw the defuncts I've added them to see if it goes away. But it didn't go away :/

mikebrady commented 4 years ago

Hmm that’s odd. How about replacing the #!/usr/bin/env bash with #!/usr/bin/bash?

patrickjane commented 4 years ago

This would be less correct and less portable, and I am sure it won't change anything.

I don't know how shairport runs those scripts, but maybe just waitpid() for the child process is not done correctly?

I doubt the contents of the scripts themselves can have any impact on the defuncts.

mikebrady commented 4 years ago

I have been unable to reproduce the issue here. Are there any other Shairport Sync settings you have made, apart from those listed above?

By default, Shairport Sync does not wait for these programs to terminate before continuing.

patrickjane commented 4 years ago

Nope, its really just a plain new installation. I can do some more tests in the evening. Should I enable some sort of debug mode/logging or something?

What scripts do you use?

mikebrady commented 4 years ago

Thanks. I'll do the same. There is a log verbosity setting towards the end of the configuration file, or from the command line add -v or -vv. The scripts I was using just now were the ones you listed above. I was running Shairport Sync from the command line as the standard pi user. I'm also using a Pi 4 with Buster.

Could you list the version of Shairport Sync ($ shairport-sync -V) are you using, please?

patrickjane commented 4 years ago

Okay, so I've got some more info:

shairport@homepi:~ $ shairport-sync -V
3.3.5-OpenSSL-Avahi-ALSA-soxr-sysconfdir:/etc

BTW when using -v or -vv I cannot see any output, is that intended? Not sure if doing it right.

HOWEVER, I noticed that I don't get any defuncts if I run this as pi user. I do however get defuncts if I run shairport-sync as extra user (shairport in my case). This is a bit weird because in both cases the scripts are successfully executed without issues.

To make the testcase even simpler (and to avoid permission issues) the scripts now merely contain:

#!/usr/bin/env bash
echo "done start"     # or "done stop" for the stop script

So as pi user:

pi@homepi:~ $ shairport-sync -v
done start
done stop

# other terminal at same time

pi@homepi:~ $ ps aux | grep shairport
pi        6894 11.7  0.4 100272  8440 pts/0    Sl+  19:52   0:02 shairport-sync -v
pi        6949  0.0  0.0   7348   520 pts/1    S+   19:53   0:00 grep --color=auto shairport

Now as user shairport:

shairport@homepi:~ $ id
uid=999(shairport) gid=995(shairport) groups=995(shairport),20(dialout),29(audio),46(plugdev),997(gpio),998(i2c)
shairport@homepi:~ $ shairport-sync -vv
done start
done stop
done start

# other terminal at same time

pi@homepi:~ $ ps aux | grep shairp
root      6952  0.0  0.1   9948  3224 pts/1    S    19:53   0:00 sudo su - shairport
root      6957  0.0  0.1   9700  2756 pts/1    S    19:53   0:00 su - shairport
shairpo+  6962  0.2  0.1   8492  3796 pts/1    S    19:53   0:00 -bash
shairpo+  6974  9.9  0.4 102320  8464 pts/1    Sl+  19:53   0:04 shairport-sync -vv
shairpo+  6981  0.0  0.0      0     0 pts/1    Z+   19:53   0:00 [bash] <defunct>
shairpo+  6990  0.1  0.0      0     0 pts/1    Z+   19:54   0:00 [bash] <defunct>
shairpo+  6993  0.2  0.0      0     0 pts/1    Z+   19:54   0:00 [bash] <defunct>
pi        7000  0.0  0.0   7348   528 pts/0    S+   19:54   0:00 grep --color=auto shairp

And one more thing to note: Interestingly, I even get defuncts with user pi if I remove the shebang line from the scripts, that is, if the scripts only contain the echo command. Which means, normally, they cannot be executed because the interpreter cannot be determined.

So to me it looks like in some situations shairport-sync is having issues executing the commands, a more detailed output and/or debugging might help to track down the issue.

mikebrady commented 4 years ago

Apologies – you need to add a u to the -v or -vv, thus -vu or vvu. The u directs debug information to the user console rather than the log.

Regarding the difference in behaviour – is it possible that it might be related to permissions associated with membership of a particular group?

I notice that the pi user has:

pi@raspberryPi4B:~ $ id
uid=1000(pi) gid=1000(pi) groups=1000(pi),4(adm),20(dialout),24(cdrom),27(sudo),29(audio),44(video),46(plugdev),60(games),100(users),105(input),109(netdev),997(gpio),998(i2c),999(spi)
patrickjane commented 4 years ago

I really cannot see how this could have any impact when both users are perfectly able to execute the scripts (proven by the output to console).

So either it can be executed - then no permission issue, or it cannot be executed. I doubt there can be something in between.

Here is the output with -vvu, however I cannot see anything about script execution:


        11.632676187 "rtsp.c:2700" Connection 1: new connection from [fe80::1898:2506:5319:78f1]:55519 to self at [fe80::4a22:75e1:2878:69cc]:5000.

        41.170221039 "rtsp.c:2366" Connection 1: Received an RTSP Packet of type "ANNOUNCE":
         0.000095869 "rtsp.c:463"   Type: "Content-Length", content: "669"
         0.000040425 "rtsp.c:463"   Type: "Content-Type", content: "application/sdp"
         0.000028703 "rtsp.c:463"   Type: "CSeq", content: "3"
         0.000026648 "rtsp.c:463"   Type: "DACP-ID", content: "E7E934A1E384E8C3"
         0.000025944 "rtsp.c:463"   Type: "Active-Remote", content: "1094366294"
         0.000025370 "rtsp.c:463"   Type: "User-Agent", content: "AirPlay/415.3"
         0.027776041 "rtsp.c:1934" Play connection from user agent "AirPlay/415.3" on RTSP conversation thread 1.
         0.000062296 "rtsp.c:1943" AirPlay version 415 detected.
         0.000033184 "rtsp.c:2409" Connection 1: RTSP Response:
         0.000032111 "rtsp.c:463"   Type: "CSeq", content: "3"
         0.000030944 "rtsp.c:463"   Type: "Server", content: "AirTunes/105.1"
         0.004552851 "rtsp.c:2366" Connection 1: Received an RTSP Packet of type "SETUP":
         0.000040832 "rtsp.c:463"   Type: "Transport", content: "RTP/AVP/UDP;unicast;mode=record;timing_port=51138;control_port=52281"
         0.000031629 "rtsp.c:463"   Type: "CSeq", content: "4"
         0.000032352 "rtsp.c:463"   Type: "DACP-ID", content: "E7E934A1E384E8C3"
         0.000029851 "rtsp.c:463"   Type: "Active-Remote", content: "1094366294"
         0.000028407 "rtsp.c:463"   Type: "User-Agent", content: "AirPlay/415.3"
         0.000038907 "rtsp.c:920" Connection 1: SETUP -- Active-Remote string seen: "1094366294".
         0.000071073 "rtsp.c:935" Connection 1: SETUP -- DACP-ID string seen: "E7E934A1E384E8C3".
         0.000051684 "rtp.c:981" Connection 1: SETUP -- Connection from fe80::1898:2506:5319:78f1 to self at fe80::4a22:75e1:2878:69cc.
         0.000224607 "rtsp.c:1000" Connection 1: SETUP DACP-ID "E7E934A1E384E8C3" from fe80::1898:2506:5319:78f1 to fe80::4a22:75e1:2878:69cc with UDP ports Control: 6001, Timing: 6002 and Audio: 6003.
         0.000044703 "rtsp.c:2409" Connection 1: RTSP Response:
         0.000028926 "rtsp.c:463"   Type: "CSeq", content: "4"
         0.000029018 "rtsp.c:463"   Type: "Server", content: "AirTunes/105.1"
         0.000028870 "rtsp.c:463"   Type: "Transport", content: "RTP/AVP/UDP;unicast;interleaved=0-1;mode=record;control_port=6001;timing_port=6002;server_port=6003"
         0.000031092 "rtsp.c:463"   Type: "Session", content: "1"
         0.005028806 "rtsp.c:2366" Connection 1: Received an RTSP Packet of type "RECORD":
         0.000101276 "rtsp.c:463"   Type: "CSeq", content: "5"
         0.000078054 "rtsp.c:463"   Type: "DACP-ID", content: "E7E934A1E384E8C3"
         0.000075425 "rtsp.c:463"   Type: "Active-Remote", content: "1094366294"
         0.000073313 "rtsp.c:463"   Type: "User-Agent", content: "AirPlay/415.3"
         0.000078110 "rtsp.c:805" Connection 1: RECORD
         0.001949098 "audio_alsa.c:877" alsa: hardware mixer prepare
         0.005452391 "audio_alsa.c:919" *warning: The hardware mixer specified -- "DSPVolume" -- does not have a dB volume scale.
         0.000491122 "audio_alsa.c:937" alsa: hardware mixer "DSPVolume" selected, with dB volume from -60.000000 to -1.350000.
         0.000942984 "audio_alsa.c:530" alsa: output format chosen is "S32".
         0.000161127 "audio_alsa.c:570" alsa: output speed chosen is 44100.
         0.002629327 "audio_alsa.c:1494" alsa: update timestamps available
         0.000127017 "audio_alsa.c:229" alsa: precision delay timing is available.
         0.000080591 "audio_alsa.c:721" alsa: precision timing selected for "auto" mode
         0.000077517 "audio_alsa.c:738" PCM handle name = 'hw:0'
         0.000076221 "audio_alsa.c:752" alsa device parameters:
         0.000082536 "audio_alsa.c:755"   access type = MMAP_INTERLEAVED
         0.000119035 "audio_alsa.c:759"   format = 'S32_LE' (Signed 32 bit Little Endian)
         0.000135016 "audio_alsa.c:763"   subformat = 'STD' (Standard)
         0.000110517 "audio_alsa.c:766"   number of channels = 2
         0.000102017 "audio_alsa.c:769"   number of significant bits = 32
         0.000100868 "audio_alsa.c:777"   rate = 44100 frames per second (precisely).
         0.000122646 "audio_alsa.c:786"   precise (rational) rate = 0.000 frames per second (i.e. 0/1088784512).
         0.000133887 "audio_alsa.c:799"   period_time = 5804 us (>).
         0.000089499 "audio_alsa.c:809"   period_size = 256 frames (precisely).
         0.000083591 "audio_alsa.c:825"   buffer_time = 1486077 us (>).
         0.000077499 "audio_alsa.c:838"   buffer_size = 65536 frames (>).
         0.000079887 "audio_alsa.c:848"   periods_per_buffer = 256 (precisely).
         0.000095739 "audio_alsa.c:1813" alsa: prepare() -- opened output device
         0.000146128 "audio_alsa.c:1962" alsa: alsa_buffer_monitor_thread_code() -- closing the output device
         0.000238107 "rtsp.c:2409" Connection 1: RTSP Response:
         0.000094887 "rtsp.c:463"   Type: "CSeq", content: "5"
         0.000077665 "rtsp.c:463"   Type: "Server", content: "AirTunes/105.1"
         0.000067462 "audio_alsa.c:1751" alsa: do_close() -- closing alsa handle
         0.000009537 "rtsp.c:463"   Type: "Audio-Latency", content: "11025"
         0.000324273 "audio_alsa.c:1965" alsa: alsa_buffer_monitor_thread_code() -- alsa_backend_state => abm_disconnected
done start
         0.004796587 "rtsp.c:2366" Connection 1: Received an RTSP Packet of type "SET_PARAMETER":
         0.000114665 "rtsp.c:463"   Type: "Content-Length", content: "20"
         0.000076665 "rtsp.c:463"   Type: "Content-Type", content: "text/parameters"
         0.000075184 "rtsp.c:463"   Type: "CSeq", content: "6"
         0.000074555 "rtsp.c:463"   Type: "DACP-ID", content: "E7E934A1E384E8C3"
         0.000073017 "rtsp.c:463"   Type: "Active-Remote", content: "1094366294"
         0.000073980 "rtsp.c:463"   Type: "User-Agent", content: "AirPlay/415.3"
         0.000989799 "loudness.c:47" Volume: -40.5 dB - Loudness gain @10Hz: 10.2 dB
         0.000119257 "player.c:2958" player_volume_without_notification: volume mode is 2, airplay volume is -16.250000, software_attenuation: -4050.000000, hardware_attenuation: -135.000000, muting is disabled.
         0.000097054 "rtsp.c:2409" Connection 1: RTSP Response:
         0.000076869 "rtsp.c:463"   Type: "CSeq", content: "6"
         0.000074591 "rtsp.c:463"   Type: "Server", content: "AirTunes/105.1"
         0.005328727 "rtsp.c:2366" Connection 1: Received an RTSP Packet of type "SET_PARAMETER":
         0.000119887 "rtsp.c:463"   Type: "Content-Length", content: "20"
         0.000078573 "rtsp.c:463"   Type: "Content-Type", content: "text/parameters"
         0.000076388 "rtsp.c:463"   Type: "CSeq", content: "7"
         0.000031814 "rtsp.c:463"   Type: "DACP-ID", content: "E7E934A1E384E8C3"
         0.000083036 "rtsp.c:463"   Type: "Active-Remote", content: "1094366294"
         0.000043499 "player.c:1951" Set initial volume to -16.250000.
         0.000082221 "rtsp.c:463"   Type: "User-Agent", content: "AirPlay/415.3"
         0.000742895 "loudness.c:47" Volume: -40.5 dB - Loudness gain @10Hz: 10.2 dB
         0.000064184 "player.c:2958" player_volume_without_notification: volume mode is 2, airplay volume is -16.250000, software_attenuation: -4050.000000, hardware_attenuation: -135.000000, muting is disabled.
         0.000059129 "player.c:1954" Play begin
         0.002128613 "loudness.c:47" Volume: -40.5 dB - Loudness gain @10Hz: 10.2 dB
         0.000320328 "player.c:2958" player_volume_without_notification: volume mode is 2, airplay volume is -16.250000, software_attenuation: -4050.000000, hardware_attenuation: -135.000000, muting is disabled.
         0.000294236 "rtsp.c:2409" Connection 1: RTSP Response:
         0.000740765 "rtsp.c:463"   Type: "CSeq", content: "7"
         0.000084480 "rtsp.c:463"   Type: "Server", content: "AirTunes/105.1"
         0.030915916 "rtsp.c:2366" Connection 1: Received an RTSP Packet of type "FLUSH":
         0.000050999 "rtsp.c:463"   Type: "RTP-Info", content: "seq=24587;rtptime=559564246"
         0.000031777 "rtsp.c:463"   Type: "CSeq", content: "8"
         0.000029704 "rtsp.c:463"   Type: "DACP-ID", content: "E7E934A1E384E8C3"
         0.000029944 "rtsp.c:463"   Type: "Active-Remote", content: "1094366294"
         0.000029425 "rtsp.c:463"   Type: "User-Agent", content: "AirPlay/415.3"
         0.000048962 "rtsp.c:2409" Connection 1: RTSP Response:
         0.000030500 "rtsp.c:463"   Type: "CSeq", content: "8"
         0.000029203 "rtsp.c:463"   Type: "Server", content: "AirTunes/105.1"
         0.028620194 "player.c:264" Hammerton Decoder used on encrypted audio.
         0.002163316 "audio_alsa.c:721" alsa: precision timing selected for "auto" mode
         0.000038111 "audio_alsa.c:1777" alsa: play() -- opened output device
         0.000013352 "audio_alsa.c:1782" alsa: play() -- alsa_backend_state => abm_playing
        19.863082308 "rtp.c:142" Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7945.6,    15269.9,   104444.0.
        19.980214558 "rtp.c:142" Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7991.5,    19035.2,    93188.0.
        19.949298823 "rtp.c:142" Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7979.1,    18998.7,    93274.0.
mikebrady commented 4 years ago

Could you say what steps you took to create your shairport user please?

mikebrady commented 4 years ago

BTW, I think you are correct – waitpid() is not called for non-blocking scripts.

patrickjane commented 4 years ago

This is how I've created the user:

sudo useradd -rm shairport -G dialout,gpio,i2c,audio,plugdev

Groups might be a bit over the top, but well, you never know.

BTW: What do you mean with "non-blocking scripts"? I think normally you would run a child process, then install a handler for SIGCHLD, and when it fires, do the waitpid() to clean up.

mikebrady commented 4 years ago

Thanks for the suggestions. A "blocking" script is one where the player thread waits for the script to complete before continuing. It'll take a little while to come up with fix and I'll report back here.

patrickjane commented 4 years ago

Ah okay, great. Did not know there is other hooks in shairport-sync which would block until executed.

mikebrady commented 4 years ago

I've pushed out an update in the development branch with a SIGCHLD handler along the lines you suggest. It should be great if you could try it out and let us know if it fixes the problem. I still don't understand why it happens the way it does (and doesn't happen on other occasions).

mikebrady commented 4 years ago

[Update] Actually I am worried about a race condition between the SIGCHLD handler and a waitpid() call. If the waitpid() call is made and then the SIGCHLD handler is called and reaps the process before waitpid() exits, will waitpid() ever exit?

mikebrady commented 4 years ago

[Further update] It seems as if waitpid() does exit under these circumstances, (whew!), but returns an ECHILD error if the SIGCHLD handler has already reaped the process, which was the case in all the cases I checked. Haven't tried the *BSDs yet though...

patrickjane commented 4 years ago

Okay wait, so I am not sure what handler you implemented, and I don't know the shairport-sync code well. But in general, if you fork a child, you must call waitpid, otherwise it will become a zombie once it terminates. Now since waitpid is blocking, you probably don't want your main process to be sitting there waiting for a child (that is, unless you actually want it to wait until the child finishes). So if your main process should go on with its work, then you would need a handler for SIGCHLD, because this is the signal you will get when the child terminates. So then, you know when to call waitpid.

In both situations ( fork -> waitpid and fork -> SIGCHLD -> waitpid) you will need waitpid or you will get defuncts. So I cannot see how there can be a race condition, because you should install the handler before the fork happens.

The only other option is to set SIGCHLD to SIG_IGN, in this case no waitpid is required. However this is less portable.

Also note that if you are using multithreading (and I saw some pthread methods in the code), traiditional unix signals will be delivered to a random thread. This is pretty much of a pain and breaks traditional signal handling. So usually you get around this by creating a dedicated thread which installs a handler for the signals using pthread-functions (not the traditional signal() etc functions).

mikebrady commented 4 years ago

Thanks. I've actually implemented a system based on this:

http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html

using the standard method (not the SIG_IGN method).

Can you point at a discussion of the multithreading issue please?

patrickjane commented 4 years ago

Okay I see. I'm afraid I don't have any documentation at hand beyond what I would find when googling it myself. Normally you would start a simple/small thread which installs signal handlers using pthread_* functions, and then add your logic, in this case the waitpid for the corresponding terminated child.

mikebrady commented 4 years ago

Thanks. Looking at this pretty lucid explanation, it seems that the signal handler, when called, will run in the context of some arbitrary thread. In the case of the SIGCHLD signal handler for Shairport Sync, we are simple executing waitpid()s for any terminated processes, and this will have no direct effect on the threads. Hence, if I understand correctly, we don't care which thread's context the handler runs in.

patrickjane commented 4 years ago

Yeah you're right, maybe in this case (only waitpid) it might in fact be no issue. Will you do any more changes to the dev branch? Otherwise I can test it today after work if you want.

mikebrady commented 4 years ago

No more changes for this issue at present. Also nothing on the pipeline for the next week or so.

mikebrady commented 4 years ago

It would be great if you could test it alright!

patrickjane commented 4 years ago

I can confirm it now seems to work. I don't see any more defuncts when running as other user.

mikebrady commented 4 years ago

Super. Many thanks for your help!

github-actions[bot] commented 3 years ago

This issue has been inactive for 60 days so will be closed 7 days from now. To prevent this, please remove the "stale" label or post a comment.