mikebrady / shairport-sync

AirPlay and AirPlay 2 audio player
Other
7.21k stars 572 forks source link

DBus Remotecontrol commands off board debug #958

Closed JulianWAS closed 4 years ago

JulianWAS commented 4 years ago

Hi,

Once in a while - more so on starting an iPad -> shairport-sync connection DBus commands like Pause and Play just don't work or don't get through. After a bit of playing and pausing iPad side things come right and DBus shairport-sync side commands work again. The DBus message responses don't seem to differ between success and failure ( just returns from shairport-sync itself?? ). As I have no idea what happens once shairport-sync picks up the DBus requests ( to pause or play ) I'm after any additional information on what happens between shairport-sync and the iPad if possible..

For example how does shairport-sync actually pass messages over the network to pause/play the iPad? Port numbers so if needed I can use wireshark to ascertain if the messages are even leaving my board or maybe they are being blocked in transit by a hub or a switch? Any stuff like this would be really helpful! Is there a PC program ( perf Ubuntu ) that could act as a message spy between my board and the iPad? This problem occurs intermittently, maybe one out of ten times?

Thanks, appreciate your work.

mikebrady commented 4 years ago

It's a little bit complex, but let me see if I can explain it.

When an AirPlay player (e.g. iTunes) is playing to an AirPlay device (e.g. Shairport Sync), the player will allow that device to send it commands. If the player happens to be iTunes, macOS Music or Forked Daapd, it will provide extra command and status information.

Users can communicate with Shairport Sync over D-Bus, MPRIS or MQTT to get it to exchange commands and data with the player.

You should be able to detect http traffic going from Shairport Sync, talking back to the player's remote control port. The port used was 3689 for iTunes, but is variable for iOS and macOS Music. You can find the port number in the Bonjour advertisement for the player (e.g. iTunes) advertising as a _dacp._tcpservice.

JulianWAS commented 4 years ago

Interesting.. I'm suspicious about coming back and playing + DBus pausing another song after a long idle time - more so the iOS side. Could it be possible the audio connection with shairport-sync remains the same ( hence can still play music ) but iOS has changed or abandoned the initial remote control port number so my requests have no destination?

mikebrady commented 4 years ago

In my experience, if there is no audio passing from the iOS device to the AirPlay speaker, iOS will itself drop the connection after a pretty brief period – of the order of tens of minutes. Because of that, Shairport Sync itself will drop the connection after a period (can't remember how long, ATM – I'll have to check).

JulianWAS commented 4 years ago

OK, I'm trying to figure out a repeatable use case where I can break the remote control side as to narrow the problem down. It might be worth keeping track of remote control port numbers to see if they change at any time, worth a go..

JulianWAS commented 4 years ago

Something a little fishy here.. I've been connected to Bluetooth for a few hours playing music and then turning off Bluetooth I connected the iPad's YouTube to shairport-sync the result being audio but no remote control functionality. After a few attempts to pause remotely I disconnected Youtube from shairport-sync then immediately reconnected after which the remote control returned. On the new connection all appears fine from that point on.

JulianWAS commented 4 years ago

Hi Mike,

II managed to catch some debug on this issue of randomly not being able to use remote control commands such as play and pause. Attached is a debug dump of it. It says the DACP port is not specified on the out going remote control command ( red text ); however if a pause is pressed on the iPad ( blue text ) or a play ( green text ) the debug shows DACP connection information. Remote control debug.pdf . This weird-ism appears to be from the time shairport-sync connects to a client - if it fails. If the remote-control works once for a specific session, it appears be solid until the session ends.

In both cases above, shairport-sync is not restarted but just connected/disconnected to using an iPad a number of times.

Thanks.

mikebrady commented 4 years ago

That’s very useful, thanks. It is necessary to know the port number to send commands and evidently Shairport Sync (SPS) didn’t see a message with that information. What is interesting is that SPS doesn’t actively look for it; instead it detects it when it is announced by the player. If it misses the announcement for some reason (it must be quite a rare occurrence), it does not go looking for it. I’ll have a look to see whether it is possible and if so, how to ask for it.

JulianWAS commented 4 years ago

Is there enough information about the remote server when the iPad initiates a play or command from that side? Looking at the debug I sent through before ( on play / pause ) is there enough passed info to cobble a connection if this problem occurs?

Attached is a DBus script that thrashes volume up and down back to the iPad. If I leave this running on the board that's running SPS then repeatably connected, play music, disconnect the iPad to SPS I can reproduce this problem easier. It's a bit mean but it may lead to something.

thrasher.sh.zip

Thanks.

mikebrady commented 4 years ago

Thanks. I’ll have a look at it when I’m near a computer — next week, I’d say.

JulianWAS commented 4 years ago

OK, one last debug. This one is complete from connecting and playing to the point of seeing no DACP port issue. The session still played music.

Debug full start no DACP.txt ..

JulianWAS commented 4 years ago

Hi Mike, Any update on what's going on yet?

Thanks.

mikebrady commented 4 years ago

Thanks for the post. I've been trying a few things to deal with it, but nothing yet, I'm afraid.

mikebrady commented 4 years ago

Still working on this – I need to simplify some of the code in there as I go...

mikebrady commented 4 years ago

Just pushed an update into the development branch which will check every two seconds for a port number if one hasn't been provided. It would be interesting to see whether it addresses the problems you are seeing.

JulianWAS commented 4 years ago

Thanks, I'll check it out.

JulianWAS commented 4 years ago

Hi Mike, I feel it's a "little" better but the problem is still definitely there. My SPS platform is running on wired Ethernet to rule out any WiFi causes. The audio side never fails. Attached is another debug from your development branch - yesterday. The test procedure this time is as follows:

Start SPS ( in debug mode - doesn't encourage problem ) Connect to SPS using an iPad and YouTube. Press play in YouTube.
Wait for audio to start. A few seconds later ( or whenever really ) send a DBus pause to SPS via a local Linux Terminal.

When it's OK, the remote control is solid for that whole connection period. When it fails to remote pause/play/etc likewise it will remain like that until the iPad disconnects and reconnects. It's hard to catch and for me it's roughly one out of tens times. There's nothing I've noticed that specifically encourages it to fail - Linux, network, ALSA, boot wise....

Thanks for all your work and persistence, do appreciate it.

Debug full start on dev build - 202003260959.txt

mikebrady commented 4 years ago

By heaven, I think I see it. The fault is in recognising the DACP ID supplied by the player. On line 275 of the log, the DACP ID is recognised as "155931CE2FAF19A". However, see line 392, what's coming in from mdns is ""0155931CE2FAF19A". So, I need to try to figure out which is "right". Sigh, it would be nice to have a spec.

mikebrady commented 4 years ago

I've just pushed a development update which, I believe, removes those leading zeroes. Let me know if it works, please.

JulianWAS commented 4 years ago

Yea you may have! Haven't tried it but here's an observation before I do. I noticed each time SPS is connected to for a session the DACP id changes in my debugs. So I ran it up and on a good session the id was 16 characters long. Comparing my failure logs the ids were only 15 characters long! So working with your missing 0 theory this would explain the randomness as every once in a while the random id generated by the remote server probably started with a 0.

I'll try it out tomorrow. Cheers.

JulianWAS commented 4 years ago

Hi Mike

I think there is/was two bugs.. I feel the leading 0 was an issue and resulted in the "No DACP port.." But it looks like there's another unfortunately. I thought after 110 perfect sessions and pauses you had nailed it, I also saw ~15% leading 0 messages float past which used to cause the initial issue.

Once happy with the initial results I received a Skype call - SPS was disconnected at the time from my iPad - after which I resumed testing, that's when I got my first failure. After that I included a Skype test call in my tests and the failure rate greatly increased, however it's not the "No DACP port.." problem according to the debug.

So my new test procedure on the iPad is: 1) Start YouTube => connect to SPS => play song => DBus local pause on SPS platform = disconnect from SPS 2) Bring up Skype => do 2 second test call => stop and minimise 3) Go back to YouTube => connect to SPS => play song => DBus again => success or failure

Again, not every time may be 1 or 2 out of 10?

I have include two debugs of failure and a good one for comparison.

Thanks.

Debug dev build - Skype - Youtube - Failed-2- 202003270937.txt Debug dev build - Skype - Youtube - good - 202003270931.txt Debug dev build - Skype - Youtube - Failed - 202003270927.txt

mikebrady commented 4 years ago

Thanks. Some progress then. I'll look at it later this evening.

mikebrady commented 4 years ago

Just looking through the logs now, and I'm a bit confused. What is happening during failure at step 3? How is it different from "success"?

mikebrady commented 4 years ago

BTW, the line:

        0.000067624 "dacp.c:1252" Unexpected return code 400 from dacp_get_client_volume.

isn't quite right. If dacp_get_client_volume returns a 200, it means that the player responds to the call. This only works for iTunes. If it returns a 400, it indicates that the player doesn't respond to this call; players in this category include iOS and the System Sound facility of macOS. So returning 400 isn't actually a bug.

mikebrady commented 4 years ago

I'm afraid I'm not getting it to misbehave...

JulianWAS commented 4 years ago

Hi, I didn't see any difference between the good and bad logs either, however I thought there would be mileage in sending both over in case you found something. What I don't understand is how it's perfect if I connect / disconnect every time while staying in YouTube, but toggling between YouTube and something else causes an issue.. What all logs show is a valid connection to the remote control server, I can't see this really being a SPS issue at this point. I will push on trying to get a repeatable failure case.

mikebrady commented 4 years ago

Yes indeed it is most perplexing. I’ve been likewise trying to get it to misbehave over the weekend and am not having much joy. When it happens, which is very seldom, it’s such a complex sequence that it’s impossible for me to see a pattern to it. If you found a good test case, it would be great.

JulianWAS commented 4 years ago

Do you think there would be mileage in tweaking the debug in a one off development build to dump as much DACP info out as possible and also severely limiting the rest to something like "playing, stopping, tearing down.." to know where you are in the debug?

JulianWAS commented 4 years ago

Attached is a mammoth debug where I didn't restart SPS until it broke. I was thinking about things like perhaps the DACP id was reused or something, or perhaps there was two DACP connections... I noticed something about the sequence of terminating connections, they go out of sequence, it may of course be perfectly normal asynchronous behaviour - red herring. Could it be possible that iOS is serving an old DACP id?

"Connection x: Terminate RTSP connection. x being: 1,2,3,4,5,6,8,7,9,10


Also I noticed two connection requests one immediately after the other: Connections 7 and 8, both off the same iPad.

     0.000146124 "rtsp.c:359" RTSP connection thread 6 deleted...
     0.000108625 "rtsp.c:2708" Connection 7: new connection from [fdaa:bbcc:ddee:0:14af:5a13:9fe8:1c50]:49222 to self at [fdaa:bbcc:ddee:0:212:deff:feed:beef]:5000.
     0.000100125 "rtsp.c:2730" Successfully created RTSP receiver thread 7.
     0.000938747 "rtsp.c:2708" Connection 8: new connection from [2a00:23c4:ba8d:dd00:f5be:7b3d:2790:877d]:49223 to self at [2a00:23c4:ba8d:dd00:212:deff:feed:beef]:5000.

The next could mean nothing at all but connections 9 and 10 have the same DACP ids, active-remote numbers the rest appear to be random.


The remote control broke somewhere in between 9 or 10. I killed SPS after it didn't hear the pause command

Thanks. SPS-biglog-202003301111.pdf

JulianWAS commented 4 years ago

With this packet Type: "Public", content: "ANNOUNCE, SETUP, RECORD, PAUSE, FLUSH, TEARDOWN, OPTIONS, GET_PARAMETER, SET_PARAMETER"

Is it possible to ask the server what it's current state is i.e paused or playing?

mikebrady commented 4 years ago

With this packet Type: "Public", content: "ANNOUNCE, SETUP, RECORD, PAUSE, FLUSH, TEARDOWN, OPTIONS, GET_PARAMETER, SET_PARAMETER"

Is it possible to ask the server what it's current state is i.e paused or playing?

Not to my knowledge. AFAIK, this packet merely tells the player what message types Shairport Sync can respond to.

mikebrady commented 4 years ago

Attached is a mammoth debug where I didn't restart SPS until it broke. ...

Thanks. I'll have a look and see if I can see anything.

Do you think there would be mileage in tweaking the debug in a one off development build to dump as much DACP info out as possible and also severely limiting the rest to something like "playing, stopping, tearing down.." to know where you are in the debug?

We might have to do something like that alright.

JulianWAS commented 4 years ago

Originally the initial problem with the leading 0 never recovered if it failed. Now however if enough DBus pauses are sent it will recover sometimes. I've noticed in a fault condition it will sometimes return remote control functionality the moment the song changes even though nothing Airplay wise should differ..

mikebrady commented 4 years ago

That’s very interesting, thanks.

JulianWAS commented 4 years ago

Hi Mike,

What Apple device are you using and are you using iTunes? I've got my hands on a Mac Mini and I'll try that and feedback any anomalies.

Cheers.

mikebrady commented 4 years ago

Hi Julian. I'm using an iPhone 10 XS Max, an iPad Pro (2017), an iMac (late 2017) and a 13" MacBook Pro (late 2013). All these are up-to-date and I use the Music app on the Mac and the Music app, TuneIn Radio, Radio Svizzera Classica, Safari, YouTube, RTÉ Radio, RTÉ News Now, RaiPlay Radio and many more on the iOS devices. On the Mac, iTunes has been superseded by the Music app. Unfortunately, the Mac Music app's ability to respond to DACP remote control messages varies with different versions. At present, it doesn't work with Shairport Sync.

JulianWAS commented 4 years ago

Wow that's a list! So far the device I had most of the issues with is the iPad mini 4, I'll see if I can get a pro to test on.. Have you actually reproduced this issue I'm now experiencing ( post leading '0' fix) ?

mikebrady commented 4 years ago

I'm afraid I haven't been able to reproduce the problem since then, no, not on any device...

JulianWAS commented 4 years ago

OK, thought you did, no worries. Have you done a master branch release since finding the leading '0' bug? If so I'll use that and spend a day on it and if all OK can close this thread.

Cheers.

mikebrady commented 4 years ago

No master release yet, Julian. I'm waiting a little while to see if anything is broken, particularly by the changes referred to in 3.3.7d2. That is really pretty major. So far, so good.

JulianWAS commented 4 years ago

Hi Mike, I've been testing with dev 3.3.7d4 and audio and sync wise it's good on the ARM as a bit of feedback. The remote issue is less frequent having found the '0' bug. Just to confirm there is still an oddity that especially on first connection after starting SPS from cold fails to remote pause. What's interesting is I fired more pauses as SPS and it continually failed until the song ended and started the new stream which was actually a YouTube add between songs, at that point the pause worked first time as I've noticed before..

Could this actually be an iOS bug? What would be renewed/refreshed between songs? Not knowing the precise internals of SPS, when one song finishes are there any tear-downs before the next starts or is everything left status-quo?

I'll keep on it.

Cheers.

JulianWAS commented 4 years ago

Thought.. I know the moment dacp.c receives my DBus pause, at that point in player.c I also know that after a ~2 second buffer drain I should be get silent frames if successful ( in player_thread_func() ). What's the possibility of re-spinning the dacp server if the frames don't go silent after 2 seconds based on a dead pause? After it's respun I could resend the pause.

Failing that, at least for investigative purposes, could you give me a patch that could respin the dacp server in the presence of an external file i.e. a touch file in /run called respin-sps.dat then in the dacp server code use access() to poll if the file exists - if so respin the server?

Cheers.

mikebrady commented 4 years ago

Thanks for the posts. I'll try to check the failure of remote pause on a new stream.

In the matter of tear-downs between tracks – that is actually under the control of the client (YouTube in this case). For instance, if you play a playlist on the Music app, there will be no teardown between tracks, but if you skip from one track to another, there will be a teardown as you go from one track to the next one. Different versions of the YouTube app have done different things, but most recently AFAIR when a clips ends a teardown is issued and a new play session starts up right after.

TBH, I am reluctant to try to rearrange Shairport Sync in order to try to possibly work around a fault in the client, which could easily be fixed in the next release. The better approach is surely to identify faults in Shairport Sync by rigorous testing (as you've done, thanks!) and fix them.

JulianWAS commented 4 years ago

Hi Mike,

Before I close this ticket, do you know of any tool or way to send a message to Airplay's remote control server? I seem to remember someone putting together html or xml strings and firing them off in python or in a browser and getting a basic response. I'm still getting a lot of non-communication between SPS and my iPad's remote control server. As a renderer SPS is great and I guess most users probably don't use the remote control side of SPS so if I had a way to talk to the remote control server from say a desktop I can start to narrow the problem down to my board, network, build etc.

On a totally different note, do you have any feedback on SPS's performance over a mesh network ( and remote control over mesh) ?

Thanks.

mikebrady commented 4 years ago

Hi there. I think there was some stuff in a Google SoC project a few years back. I don't really know whether it is any good or not, I'm afraid.

No particular feedback on mesh network performance, either positive or negative. Maybe some other users can contribute about their experiences.

JulianWAS commented 4 years ago

Any idea what the project's called?

mikebrady commented 4 years ago

Not too sure, but it might have been this one.

JulianWAS commented 4 years ago

Thanks. I might lift the mdns, rtsp and dacp files etc from SPS and make an info dumper and run it on a desktop.

On stopping SPS does it communicate a termination request with the connected iOS device? Observation seems to suggest it doesn't as the service remains well after SPS has been stopped on the iOS device. I know iOS caches discovered Airplay devices but I would of thought a disconnection would remove it .

Cheers.

mikebrady commented 4 years ago

Stopping a session is under the control of the player. It issues a TEARDOWN and SPS sends a response, but it is the player that finally drops the connection.

JulianWAS commented 4 years ago

Thanks.