mikebrady / shairport-sync

AirPlay and AirPlay 2 audio player
Other
7.2k stars 571 forks source link

[Problem]: Not calling audio output handler start() and stop() in buffered mode when you pause and resume the music #1532

Closed andreadaoud closed 1 year ago

andreadaoud commented 2 years ago

What happened?

I'm adding a custom output device (struct audio_output). In my design, this output device gives out audio data through a dbus file descriptor, and also receives hardware delay information from dbus. In my mind, this will be used to cascade CamillaDSP to perform rather complex audio processing while still keeping precise delay.

I'm trying to implement this with Airplay 2 Buffered mode, playing from Apple Music app on iPhone. I need to know when the stream starts and stops, and in order to prevent issue caused by hardware underrun, I wish to use the start() and stop() callback function in audio_output to monitor the playing state (or, alternatively, I can use dbus Active indicator to get the state). However, currently the stop() is only called when you completely stop playing (Exiting Apple Music app) or pausing for a long time. start() is not triggered when you pause for a short time. Likewise, the Active dbus property will not change to false when you pause unless you pause for a long time.

This behaviour is not consistent with Realtime mode, where it will trigger stop immediately when you pause the audio stream. But, it is definitely possible to know the pause state from rtsp stream. So would you please explain the design decision why stop() is omitted in Buffered mode? Do you have any suggestions? Thank you very much.

Relevant log output

No response

Operating System?

Linux 5.15.32-rt39 #2 SMP PREEMPT_RT Wed Aug 31 17:36:32 CST 2022 aarch64 GNU/Linux

Version?

Latest development branch

How did you install Shairport Sync?

Built from source

Check previous issues

mikebrady commented 2 years ago

Thanks for the post. This behaviour is an outcome of what is happening on the link from the client (the player, e.g. the Music app) to the AirPlay 2 server (Shairport Sync). With a buffered stream, when you stop or pause the audio, the link will be kept open for about 6 (?) minutes before it is closed. Let me think a little about how to incorporate all this into the start/stop, active/inactive infrastructure.

mikebrady commented 2 years ago

Just looking at this again, the two active callbacks should work in this case -- the code is already in place. The callback hooks are documented in the configuration file:

//  run_this_before_entering_active_state = "/full/path/to/application and args"; // make sure the application has executable permission. If it's a script, include the shebang (#!/bin/...) on the first line
//  run_this_after_exiting_active_state = "/full/path/to/application and args"; // make sure the application has executable permission. If it's a script, include the shebang (#!/bin/...) on the first line

For example, I have this:

        run_this_before_entering_active_state = "/usr/bin/logger \"Active Start\""; // make sure the application has executable permission. If it's a script, include the shebang (#!/bin/...) on the first line
        run_this_after_exiting_active_state = "/usr/bin/logger \"Active End\""; // make sure the application has executable permission. If it's a script, include the shebang (#!/bin/...) on the first line

and I get this in the logs (I have verbosity of 1 set too):

Sep 05 19:57:50 RaspberryPi3B shairport-sync[28725]:       6604.146731595 "rtsp.c:4174" Connection 1: Current volume (-24.000000) requested
Sep 05 19:58:02 RaspberryPi3B shairport-sync[28725]:       6615.841303362 "rtsp.c:3154" Connection 1. AP2 Buffered Audio Stream.
Sep 05 19:58:02 RaspberryPi3B shairport-sync[28725]:       6615.846863101 "activity_monitor.c:63" abeg
Sep 05 19:58:02 RaspberryPi3B shairport-sync[28885]: Active Start
Sep 05 19:58:03 RaspberryPi3B shairport-sync[28725]:       6616.766847059 "player.c:2652" Connection 1: Playback Started -- AirPlay 2 Buffered.
Sep 05 19:58:11 RaspberryPi3B shairport-sync[28725]:       6624.697293619 "player.c:2489" Sync Error ms | Net Sync PPM | All Sync PPM | Min DAC Queue | Min Buffer Size | Output FPS (r) | Output FPS (c)
Sep 05 19:58:11 RaspberryPi3B shairport-sync[28725]:       6624.697451067 "player.c:2489"         -0.39            0.0            0.0            6112              124k              N/A              N/A
Sep 05 19:58:24 RaspberryPi3B shairport-sync[28725]:       6638.242516114 "activity_monitor.c:87" aend
Sep 05 19:58:24 RaspberryPi3B shairport-sync[28890]: Active End
Sep 05 20:03:14 RaspberryPi3B shairport-sync[28725]:       6928.241536732 "player.c:1650" Connection 1: Playback Stopped. Total playing time 00:05:11.

The length of time before the active state is considered to have ended is set with the active_state_timeout setting.

//  active_state_timeout = 10.0; // wait for this number of seconds after play ends before leaving the active state, unless another play session begins.

Also, there is a sample application written in C that listens to D Bus properties from Shairport Sync -- add the flag --with-dbus-test-client to the ./configure... stuff and the application will be built along with Shairport Sync.

Here is a sample run:

$ ./shairport-sync-dbus-test-client 
Listening on the D-Bus system bus.
Starting test...
Using the RemoteControl interface, play for five seconds, pause for five seconds and then resume play...
Play...
Pause...
Play...
Using the RemoteControl interface, set AirPlay Volume (range -30 to 0) to -30, -20, -10, 0 and -15 for five seconds each...
Set AirPlay Volume (range -30 to 0) to -30
Set AirPlay Volume (range -30 to 0) to -20
Set AirPlay Volume (range -30 to 0) to -10
Set AirPlay Volume (range -30 to 0) to -0
Set AirPlay Volume (range -30 to 0) to -15
Using the AdvancedRemoteControl interface, set Volume to 20%, 100%, 40% and 60% for five seconds each...
Set Volume to 20%
Set Volume to 100%
Set Volume to 40%
Set Volume to 50%
Using the RemoteControl interface, increase volume for five seconds...
Using the RemoteControl interface, decrease volume...
Finished test. Listening for property changes...
 *** Properties Changed:
      ShairportSync.RemoteControl.Metadata -> @a{sv} {}
      ShairportSync.RemoteControl.Client -> 'fe80::8fb:ccd7:e369:3566'
 *** Properties Changed:
      ShairportSync.Active -> true
 *** Properties Changed:
      ShairportSync.RemoteControl.PlayerState -> 'Playing'
 *** Properties Changed:
      ShairportSync.RemoteControl.AirplayVolume -> -24.0
 *** Properties Changed:
      ShairportSync.Volume -> -21.269245147705078
 *** Properties Changed:
      ShairportSync.RemoteControl.AirplayVolume -> -21.269245147705078
 *** Properties Changed:
      ShairportSync.RemoteControl.Available -> true
 *** Properties Changed:
      ShairportSync.RemoteControl.Metadata -> {'mpris:artUrl': <'file:///tmp/shairport-sync/.cache/coverart/cover-2ac91c48298c8c37818f129c762e4442.jpg'>}
 *** Properties Changed:
      ShairportSync.RemoteControl.ProgressString -> '1877561545/1877585665/1890715692'
 *** Properties Changed:
      ShairportSync.RemoteControl.ProgressString -> '1877603966/1877610061/1890758113'
 *** Properties Changed:
      ShairportSync.RemoteControl.Metadata -> {'mpris:artUrl': <'file:///tmp/shairport-sync/.cache/coverart/cover-2ac91c48298c8c37818f129c762e4442.jpg'>, 'mpris:trackid': <objectpath '/org/gnome/ShairportSync/9ADFF61D88E8C17D'>, 'xesam:title': <'The Queen of the Rushes, The Whinny Hills of Leitrim, The Clumsy Lover'>, 'xesam:album': <'Cape To Clare'>, 'xesam:artist': <["Adam Shapiro, Fiddle; David Shapiro, Guitar and Whistle; Mark Farrelly, Uillean Pipes; Kevin Griffin, Banjo; Simon O'Reilly, Electric Guitar; Blarney O'Shamrock, Bodhr?n"]>, 'xesam:genre': <['World']>, 'mpris:length': <int64 298280000>}
 *** Properties Changed:
      ShairportSync.Active -> false
 *** Properties Changed:
      ShairportSync.RemoteControl.PlayerState -> 'Stopped'
 *** Properties Changed:
      ShairportSync.RemoteControl.Available -> false

I've just been checking the development version and these things are all working correctly, including when a pause or stop occurs in a buffered audio stream.

Let us know what you think -- is this enough for you to be able to proceed?

mikebrady commented 2 years ago

(I should add that much of the remote control stuff doesn't work reliably with AirPlay 2.)

andreadaoud commented 2 years ago

As you've said, remote control is not very reliable. Instead, I found that this code will always be executed when pause / resume playing. Also, every time playing is resumed, this code will be called. The logic around these snippets are too long, I'm not very familiar with AirPlay protocol, but I think there might be some reliable way to detect pause / resume?

mikebrady commented 2 years ago

Yeah, it's a pretty complicated situation.

There are three protocols -- "classic" AirPlay, AirPlay 2 Realtime Streams and AirPlay 2 Buffered Streams. Then there are the different clients -- iOS, macOS (Apple Silicon and Intel), AppleTV, HomePod mini, Sonos, etc. all with different apps and different versions. From experience, they all can have significantly different behaviours.

For instance, the first piece of code you reference relates to Buffered Audio. The second piece of code relates to all audio, but may be called more than once when playing starts or is resumed.

Added to that is the desire to ensure that the processing of the metadata or associated scripts -- at events like pause, resume, etc. in Shairport Sync -- does not block the threads processing commands or audio unless requested.

The common things for all streams, of course, is:

  1. The stream of PCM audio.
  2. The timing of specific frames of audio relative to the sender's clock.
  3. The timing of the sender's clock relative a local clock.

The first of these is readily available, but the others are not exposed. Perhaps a simple APl could be devised to provide them to a backend.

andreadaoud commented 2 years ago

Thank you very much for your explanation! I think currently I just need a signal on PCM stream start and stop, preferably exposed through audio_output API and dbus Active property. Do you think if this can be achieved easily? If so, could you please give me some guidance (or, do me a favour to implement it)?

mikebrady commented 2 years ago

Unfortunately, it seems to me that a reliable start/stop signal has to be deduced from the behaviour of the PCM stream.

Crudely, if the PCM stream is there, the stream is running; otherwise the stream isn't running. Your backend could do this by checking what comes in together with its timing information (as outlined above). If the stream was delivered a little ahead of time, say 100 ms or so, you might have time to pass on the stream, the timing and any supervisory signals to the next part of the processing chain.

From my perspective, it would be possible to provide that expanded timing. You can already get the stream a small fixed amount of time ahead of when it's meant to be played. Generating a stop/start signal inside Shairport Sync would be a fair bit of work, I'm afraid.

Separately, have you looked at the convolution code that is already available in Shairport Sync? It processes the audio and then "gives it back" to Shairport Sync for it to be output through an existing backend. In a similar way, I wonder if you could send audio out to CamillaDSP, get if back and then forward it on to Shairport Sync for output. Anyway, it's just a thought, and it is a bit different from your plan.

andreadaoud commented 2 years ago

I had a thought about the architecture today, and indeed the frame time and relative clock drift information is needed. As you said it can be added with simple effort, could you please add it?

mikebrady commented 2 years ago

Sure -- I'll try to add it in the next week or so.

mikebrady commented 2 years ago

Hello again. So, I have made an initial attempt to do this by adding three parameters to the play() function call to the a backend. Up to now, the play() function passes a buffer containing a number of audio frames to be output. Now it also includes:

  1. A flag indicating whether the frames are timed or not,
  2. If timed, the RTP timestamp supplied by the client that corresponds to the first frame to be played,
  3. The local time at which the first frame should be played.

Now, I've put it all in the danger branch, and you'll have to switch over to the danger branch of NQPTP as well.

If you look in the audio_dummy backend, there's a little more information and a tiny piece of sample code.

andreadaoud commented 2 years ago

Thanks for your work! I have a little question regarding "The local time at which the first frame should be played." From my previous understanding of the code, "when to play" information should be adjusted by shairport-sync by reading the delay field, and insert slience to control timing. Now it seems that I have to control when to play. I wonder if something I didn't understand correctly?

mikebrady commented 2 years ago

Thanks. Shairport Sync does indeed insert silence to control timing -- that's those untimed frames -- but it also sends audio frames to the backend a little ahead of time. This is set by the audio_backend_latency_offset_in_seconds, which defaults to 0.15 seconds nominally.

If you run the audio dummy sample, it will print out the exact lead time of each frame. As you can see, you could ignore the local time info, or, if you want very fine control over the exact time the frames should be played you can use it. Here is some output from a Pi 3:

        94.276758714 "audio_dummy.c:61" leadtime for frame 3096227723 is 132728897 nanoseconds, i.e. 0.132729 seconds.
        94.277797255 "audio_dummy.c:61" leadtime for frame 3096228075 is 139672162 nanoseconds, i.e. 0.139672 seconds.
        94.278366630 "audio_dummy.c:61" leadtime for frame 3096228427 is 147084126 nanoseconds, i.e. 0.147084 seconds.
        94.297346318 "audio_dummy.c:61" leadtime for frame 3096228779 is 136086975 nanoseconds, i.e. 0.136087 seconds.
        94.297943974 "audio_dummy.c:61" leadtime for frame 3096229131 is 143470449 nanoseconds, i.e. 0.143470 seconds.
        94.317932151 "audio_dummy.c:61" leadtime for frame 3096229483 is 131464600 nanoseconds, i.e. 0.131465 seconds.
andreadaoud commented 2 years ago

Previously I just used the pipe backend, and start playing as soon as there is enough data in the buffer. What I do not understand is that, now that you added "The local time at which the first frame should be played." information, how can I take advantage of this extra information? Do I only use it to calculate delay and report back to shairport-sync(without modifying the current play behaivour), or do I have to use it to control playback bahaviour (for example, insert delay until the specified time when playing)?

andreadaoud commented 2 years ago

As you can see, you could ignore the local time info, or, if you want very fine control

Assume I use ALSA backend for CamillaDSP, it will have full access to the ALSA hardware, just like ALSA backend in shairport-sync. If I want to achieve the same sync level as the alsa backend in shairport-sync, how should I use this information? Thank you very much for your explanation!

mikebrady commented 2 years ago

Well, if your code takes a short, fixed amount of time to execute, then I suppose you could simply ignore the extra information alright.

What I was thinking was that if you were passing the frames of audio to another system for processing, you could pass the timing information as well, so that this other system could use it to calculate when it should play the material. It could compare it with the local time itself and work out how long to delay playing the first frame.

mikebrady commented 2 years ago

I don't know how much support CamillaDP has for ALSA, but the ALSA system itself can tell you how many frames remain to be output at any time. You can use this, along with the system time and the first frame's local time (as now given by Shairport Sync) to work out what to do.

andreadaoud commented 2 years ago

Basically, CamillaDSP works like this:

Capture thread ---pass audio---> Processing thread ---pass audio---> Buffer ---> Playback thread (call ALSA API)

When audio data arrived at capture thread, it will send data to process thread, and then to playback thread. Playback thread will start playing when there are enough samples in the buffer.

Processing thread contains DSP processing code, which can be time-consuming to a certain extent (but not long, maybe a few tens of milliseconds), so you can assume the latency is unpredictable until playing is started. After playing is started, the latency will not change, and can be detected reliably.

I don't know how much support CamillaDP has for ALSA

To add precise timing support, I will modify the code of CamillaDSP, so code to get information of delay (like snd_pcm_delay) can be added.

You can use this, along with the system time and the first frame's local time (as now given by Shairport Sync) to work out what to do.

I'm still not very clear what can I do with timing information. Before playing the actual audio, shairport-sync will insert some silence data (without timing information). When the silence data reaches the playback thread, it will start, and play continuously. Later, when timed frames arrive, it is not possible to pause and wait. The only way to calibrate the delay I can think of, is that shairport-sync should adjust the length of silence before starting according to the reported delay value. But when the silence is sent, timing information is not yet available, so delay cannot be calculated. Could you please be more specific on how the information can possibly be used?

I know I'm asking too many questions, thank you for your patient explanation!

mikebrady commented 2 years ago

Shairport Sync's behaviour depends on the capabilities of the backend.

If the backend has the ability to report a delay (e.g. audio_alsa.c), it implements a delay() function. This returns the delay in frames and a status. When silence is first sent to the backend, it will occupy the output buffer and will be reported back through the delay() function. Shairport Sync can then do its active synchronisation thing -- interpolating frames to keep the output buffer from growing or contracting excessively. Thus, if you can calculate delay in your processing pipeline, you can feed it back in to Shairport Sync through a delay() function. Then, you would not need any of the extra parameters, and Shairport Sync would do synchronisation for you.

Without a delay() function, Shairport Sync works as if there is no delay. It may send some silence before the start of the audio, but it tries to ensure that the initial packet of audio is delivered at exactly the right time. Without a delay() function, Shairport Sync does not attempt any further synchronisation -- it doesn't have any information to base it on. In that case, it would up to the backend to process the audio at exactly the right time, and to ensure that it wasn't taking too much or too little time -- hence the need for the local time.

I know I'm asking too many questions, thank you for your patient explanation!

Not a problem.

mikebrady commented 2 years ago

Just FYI, I've rolled these changes into the development branch, and I'll probably close the danger branch, at least temporarily.

andreadaoud commented 2 years ago

One more small question: are all untimed frame silence? Can I drop untimed frames?

mikebrady commented 2 years ago

Yep, they are always silence, though it might be dithered, so will not actually contain just zero values.

github-actions[bot] commented 1 year ago

This issue has been inactive for 45 days so will be closed 7 days from now. To prevent this, please remove the "stale" label or post a comment.