owntone / owntone-server

Linux/FreeBSD DAAP (iTunes) and MPD audio server with support for AirPlay 1 and 2 speakers (multiroom), Apple Remote (and compatibles), Chromecast, Spotify and internet radio.
https://owntone.github.io/owntone-server
GNU General Public License v2.0
2.1k stars 237 forks source link

Airplay devices keep failing recently #775

Closed LordMyschkin closed 5 years ago

LordMyschkin commented 5 years ago

while investigation the artwort bug from #773, i noticed that my speakers keep failing. To simplify, I only testet with two (Samsung E751 and Boston MCAir200; Seems to be a problem with synch: First, the speakers state a few dozen times they request a retransmit, then they mute. forked-daapd.zip

[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58460 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58461 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58481 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58483 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58485 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58486 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58488 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58489 len 1
[2019-07-14 17:43:28] [DEBUG]     raop: Got retransmit request from 'Samsung W_Audio E750_d1:e1:b5', seq_start 58490 len 1
[2019-07-14 17:43:37] [ WARN]    event: evrtsp_read: read timeout: No such file or directory
[2019-07-14 17:43:37] [  LOG]     raop: No reply from 'Samsung W_Audio E750_d1:e1:b5' to our keep alive request, hanging up
[2019-07-14 17:43:37] [DEBUG]   player: Callback request received, id is 1
[2019-07-14 17:43:37] [DEBUG]   player: Making deferred callback to device_streaming_cb, id was 1
[2019-07-14 17:43:37] [DEBUG]   player: Callback from AirPlay to device_streaming_cb (status -1)
[2019-07-14 17:43:37] [  LOG]   player: The AirPlay device 'Samsung W_Audio E750_d1:e1:b5' FAILED
[2019-07-14 17:43:37] [DEBUG]   player: 0. Active callback: device_streaming_cb
ejurgensen commented 5 years ago

Your Airplay errors confuse me, because they don't seem to be the same, but given the shear number there must be some common cause? To sum up, you have observed these four error types:

[2019-07-13 16:48:16] [ LOG] raop: Send error for 'MC200Air': Connection refused
...
[2019-07-14 15:45:23] [ LOG] raop: Device 'MC200Air' closed RTSP connection
...
[2019-07-14 16:58:51] [ LOG] player: Output delay detected (behind=229, max=150), resetting all outputs
...
Retransmits and then:
[2019-07-14 17:43:37] [ LOG] raop: No reply from 'Samsung W_Audio E750_d1:e1:b5' to our keep alive request, hanging up

So for the first two the device is hanging up (albeit in different ways), for the next one forked-daapd seems to be blocked writing to the devices (unusual - not sure what might cause that), and finally what seems to be a connection issue.

There are no real hints to what might specifically be wrong, and I haven't made any changes to these parts of forked-daapd in a while, so that leaves me even more confused.

To progress with this I think it will be necessary that you identify a single, reproducible way of making one of your devices fail. Then let's dig into that more systematically. One thing to test would then be to go back and test maybe the latest release version of forked-daapd and see if that also fails - i.e. try to identify when a bug may have appeared. Another thing to test would be if the network has an impact (wifi vs wired).

LordMyschkin commented 5 years ago

Wont be easy to reproduce, If I find a way, I get you an update! since i have to run at least two speakers for a while, I have to find a day my girlfriend is out...

floogs commented 5 years ago

Wasn't sure if I should open a separate issue for this, but my Sony STR-DN840 receiver is more or less guaranteed to hang forked-daapd after anywhere from 5 mins to a couple hours of receiving audio. Confirmed the same failure doesn't occur while streaming the same files to an Apple TV, so I assume the receiver is the culprit.

Typical behavior is for playback to stop abruptly in the middle of a song, at which point the forked-daapd process goes to 100% CPU and becomes unresponsive but doesn't crash outright. (Though, it also does crash periodically -- not sure if that's an unrelated issue, but come to think of it I don't believe I've seen a crash since switching to the Apple TV). All devices are on the wired LAN for what it's worth, and the version I'm running was built in FreeBSD 11.2 from source on June 29 (so apologies if you've fixed this already).

I managed to catch the hang a couple of times at warn and debug log levels -- this will go on incessantly unless I kill the process. (In the debug instance I didn't get around to killing it for almost 24 hours, which generated a log file over 700GB (!!). Never seen ZFS hit 15x compression before.)

[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 546 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 547 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 548 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 542 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 543 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 544 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 545 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 546 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 547 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 548 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 542 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 543 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 544 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 545 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 546 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 547 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 548 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 542 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 543 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 544 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 545 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 546 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 547 (len 1), but not in buffer
[2019-06-23 22:20:25] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 548 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 705 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 706 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 707 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 708 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 709 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 710 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 711 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 542 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 543 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 544 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 545 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 546 (len 1), but not in buffer
[2019-06-23 22:20:26] [ WARN]     raop: Device 'STR-DN840 49F20C' asking for seqnum 547 (len 1), but not in buffer
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61259 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61260 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61261 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61262 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61263 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61264 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61265 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61266 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61267 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61268 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61269 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61270 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61271 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61272 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61273 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61274 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61275 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61276 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61277 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61278 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61279 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61280 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61281 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61282 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61283 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61284 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61285 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61286 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61287 not in buffer (have seqnum 64541 to 4)
[2019-05-20 00:19:21] [DEBUG]   player: Seqnum 61288 not in buffer (have seqnum 64541 to 4)
ejurgensen commented 5 years ago

I personally have a Sony DN1040, which is probably the same as yours when it comes to this. I have tested a lot with it, and since I haven’t experienced this it probably means there is more to it than just the device.

The extra factor may be network packet loss, because when you see those log messages it means the device is asking for a packet to be sent again. As you can see, it fails because forked-daapd no longer has the packet in the retransmit buffer.

I think the packet rate is 352 per second, and from your log you can see that the device is asking for packets with numbers that are nowhere near the current. That could be a bug in forked-daapd, but determination of that will require more investigation.

floogs commented 5 years ago

I'm happy to test to help narrow down the cause (we're trying to ditch the Apple TV, so getting the receiver to work reliably with forked-daapd would be welcome).

I'll compile a new version from the current source tonight to make sure the issue hasn't already been addressed in the last month -- what do you recommend testing from there?

floogs commented 5 years ago

I've done a lot of network testing on this -- my receiver typically is connected to an older gigabit switch and cable, so I tried connecting it to wi-fi instead, and then also connected it directly to the NAS that forked-daapd runs on via two brand new Cat 6 cables and a new switch, and and in both cases still observed the same flood of retransmit requests.

In fairness, the new version I compiled last week seems to handle this situation more gracefully than before since playback continues smoothly for a while after the errors start, but it still results in forked-daapd seizing up eventually. In both cases I had a tail of the debug log running in a console window and watched the hang happen more or less in real time.

If there's any other testing I can do on my end to help diagnose, let me know -- I feel confident that it isn't a network issue after all this, though.

ejurgensen commented 5 years ago

Good work on ruling out network. I can’t think of a reason why forked-daapd would hang after retransmission requests, so when I am back home I will have to mock some to see what happens. Can you confirm that what you are seeing is a hang, ie the process runs but is unresponsive?

Other than that you could try adding logging of which sequence numbers are sent, and which are requested. Requires modifying the code, so not sure if you are up to that.

It is odd that you see retransmission requests when there is no packet loss. Only possible reason for that I can think of is that the timing is wrong. A rough test for that would be for you to try running the last release of forked-daapd, so 26.5, to see if the issue is also there. I have changed a lot of the internals since that release, so maybe I have made a mistake?

floogs commented 5 years ago

I'd be glad to give the sequence logging a shot if you're able to walk me through what I'd need to modify -- if it's just adding or changing some lines that shouldn't be a problem. (I definitely don't have the expertise to figure it out myself though.)

I'll reproduce the hang and paste the results in here, but typically, htop reports the process as sitting at 100% CPU indefinitely, and the iOS Remote app can no longer connect to it. Even trying to cleanly shut down the service never completes and requires me to stop and restart the entire jail it's running in. (I have an old Raspberry Pi lying around, if you think it'd be worth building and testing f-d in Raspbian to rule out any FreeBSD/jail-related issues.)

Should be able to compile the 26.5 release and give that a shot too, stay tuned.

ejurgensen commented 5 years ago

Sounds like there might be two issues then: CPU at 100% (infinite loop?) and some issue making the device request retransmission.

On second thought I am not sure logging of sequence number will help, and any case it is difficult for me to give instructions since I am on phone. So right now let us just see how the release version works.

floogs commented 5 years ago

The flood of retransmit requests is what seems to lead to the lockup and CPU maxing out -- to be clear, the process sits at normal utilization (like 0.5% CPU) until it becomes unresponsive via Remote app and command line. A loop does seem likely from what I can tell.

I spun up the stock 26.5 release in a separate jail and have done some testing. The log looks much different as I'm sure you know, and there haven't been any similar endless requests. Sample output below.

This version does have a different and less severe issue than the current master -- I loaded up a long playlist last night before bed and then another one before I went to work today, and in both cases by the time I checked on it ~9 hours later, no audio was actually being played from the speakers. However the Remote app showed the current song timeline progressing normally, I could control the receiver volume from the app as usual, and skipping to the next song in the playlist restored audio. So something is interfering with playback after some amount of time with that version, but it isn't hard locking the entire system like the new one.

[2019-08-08 19:56:40] [ INFO]   player: Opening '/shoebee/music/Jeremy Soule/The Elder Scrolls V  Skyrim/1-10 Under an Ancient Sun.flac' (id=497, item-id=1351)
[2019-08-08 19:56:40] [DEBUG]   player: Action is setup
[2019-08-08 19:56:40] [DEBUG]   ffmpeg: tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3
[2019-08-08 19:56:40] [DEBUG]   player: Action is metadata_get
[2019-08-08 19:56:40] [DEBUG]     main: Got worker execute request
[2019-08-08 19:56:40] [DEBUG]   player: Action is start
[2019-08-08 19:56:40] [DEBUG]       db: Running query 'BEGIN TRANSACTION;'
[2019-08-08 19:56:40] [DEBUG]       db: Starting enum 'SELECT * FROM queue f WHERE id = 1351 ORDER BY pos;'
[2019-08-08 19:56:40] [DEBUG]       db: Running query 'END TRANSACTION;'
[2019-08-08 19:56:40] [DEBUG]       db: Running query 'BEGIN TRANSACTION;'
[2019-08-08 19:56:40] [DEBUG]       db: Starting enum 'SELECT * FROM queue f WHERE id = 1351 ORDER BY pos;'
[2019-08-08 19:56:40] [DEBUG]       db: Running query 'END TRANSACTION;'
[2019-08-08 19:56:40] [DEBUG]  artwork: Artwork request for item 497
[2019-08-08 19:56:40] [DEBUG]       db: Running query 'SELECT COUNT(*) FROM files f WHERE f.disabled = 0 AND id = 497;'
[2019-08-08 19:56:40] [DEBUG]       db: Starting query 'SELECT f.* FROM files f WHERE f.disabled = 0 AND id = 497   ;'
[2019-08-08 19:56:40] [DEBUG]    cache: Running query 'SELECT a.format, a.data FROM artwork a WHERE a.type = 0 AND a.persistentid = 8534338792213455689 AND a.max_w = 600 AND a.max_h = 600;'
[2019-08-08 19:56:40] [DEBUG]    cache: Cache hit: SELECT a.format, a.data FROM artwork a WHERE a.type = 0 AND a.persistentid = 8534338792213455689 AND a.max_w = 600 AND a.max_h = 600;
[2019-08-08 19:56:40] [DEBUG]  artwork: Artwork for group 8534338792213455689 found in source 'cache'
[2019-08-08 19:56:40] [DEBUG]     raop: send_metadata: Sending SET_PARAMETER to 'STR-DN840 49F20C'
[2019-08-08 19:56:40] [DEBUG]     raop: send_artwork: Sending SET_PARAMETER to 'STR-DN840 49F20C'
[2019-08-08 19:56:40] [DEBUG]     raop: send_progress: Sending SET_PARAMETER to 'STR-DN840 49F20C'
[2019-08-08 19:56:40] [DEBUG]      mpd: Asynchronous listener callback called with event type 1.
[2019-08-08 19:56:40] [DEBUG]      mpd: Notify clients waiting for idle results: 1
[2019-08-08 19:56:42] [DEBUG]     main: Got worker execute request
[2019-08-08 19:56:42] [DEBUG]   player: Playback switched to next song
[2019-08-08 19:56:42] [DEBUG]      mpd: Asynchronous listener callback called with event type 1.
[2019-08-08 19:56:42] [DEBUG]      mpd: Notify clients waiting for idle results: 1
[2019-08-08 19:56:47] [DEBUG]       db: Running query 'UPDATE files SET play_count = play_count + 1, time_played = 1565319407, seek = 0 WHERE id = 1785;'
[2019-08-08 19:56:47] [DEBUG]       db: Running query 'INSERT OR REPLACE INTO admin (key, value) VALUES ('db_modified', '1565319407');'
[2019-08-08 19:56:48] [DEBUG]   player: Input buffer has 353204 bytes
[2019-08-08 19:56:54] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:56:58] [DEBUG]   player: Input buffer has 358452 bytes
[2019-08-08 19:57:08] [DEBUG]   player: Input buffer has 363700 bytes
[2019-08-08 19:57:18] [DEBUG]   player: Input buffer has 368948 bytes
[2019-08-08 19:57:24] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:57:28] [DEBUG]   player: Input buffer has 357812 bytes
[2019-08-08 19:57:38] [DEBUG]   player: Input buffer has 363060 bytes
[2019-08-08 19:57:48] [DEBUG]   player: Input buffer has 351924 bytes
[2019-08-08 19:57:54] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:57:58] [DEBUG]   player: Input buffer has 357172 bytes
[2019-08-08 19:58:08] [DEBUG]   player: Input buffer has 362420 bytes
[2019-08-08 19:58:18] [DEBUG]   player: Input buffer has 367668 bytes
[2019-08-08 19:58:24] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:58:28] [DEBUG]   player: Input buffer has 356532 bytes
[2019-08-08 19:58:38] [DEBUG]   player: Input buffer has 361780 bytes
[2019-08-08 19:58:48] [DEBUG]   player: Input buffer has 367028 bytes
[2019-08-08 19:58:54] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:58:58] [DEBUG]   player: Input buffer has 355892 bytes
[2019-08-08 19:59:08] [DEBUG]   player: Input buffer has 361140 bytes
[2019-08-08 19:59:18] [DEBUG]   player: Input buffer has 366388 bytes
[2019-08-08 19:59:24] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:59:28] [DEBUG]   player: Input buffer has 355252 bytes
[2019-08-08 19:59:38] [DEBUG]   player: Input buffer has 360500 bytes
[2019-08-08 19:59:48] [DEBUG]   player: Input buffer has 365748 bytes
[2019-08-08 19:59:54] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'STR-DN840 49F20C'
[2019-08-08 19:59:58] [DEBUG]   player: Input buffer has 354612 bytes
[2019-08-08 20:00:08] [DEBUG]   player: Input buffer has 359860 bytes
[2019-08-08 20:00:18] [DEBUG]   player: Input buffer has 365108 bytes
[2019-08-08 20:00:22] [ WARN]    xcode: Could not read frame: End of file
[2019-08-08 20:00:22] [DEBUG]   player: Playback loop stopped (break is 0, ret 0)
[2019-08-08 20:00:24] [DEBUG]   player: Switching track
[2019-08-08 20:00:24] [DEBUG]   player: Pause called, stopping playback loop
[2019-08-08 20:00:24] [DEBUG]   player: Flushing 0 bytes with flags 0
[2019-08-08 20:00:24] [DEBUG]   player: Action is stop
[2019-08-08 20:00:24] [DEBUG]   ffmpeg: Statistics: 14107671 bytes read, 0 seeks
[2019-08-08 20:00:24] [DEBUG]       db: Fetch by pos: pos (1) relative to item with id (1351)
[2019-08-08 20:00:24] [DEBUG]       db: Running query 'BEGIN TRANSACTION;'
[2019-08-08 20:00:24] [DEBUG]       db: Fetch by pos: pos (1) relative to item with id (1351)
[2019-08-08 20:00:24] [DEBUG]       db: Running query 'SELECT shuffle_pos FROM queue WHERE id = 1351;'
[2019-08-08 20:00:24] [DEBUG]       db: Fetch by pos: item (1351) has absolute pos 199
[2019-08-08 20:00:24] [DEBUG]       db: Starting enum 'SELECT * FROM queue f WHERE shuffle_pos = 200 ORDER BY pos;'
[2019-08-08 20:00:24] [DEBUG]       db: Fetch by pos: fetched item (id=1692, pos=716, file-id=3096)
[2019-08-08 20:00:24] [DEBUG]       db: Running query 'END TRANSACTION;'
[2019-08-08 20:00:24] [DEBUG]       db: Fetch by pos: fetched item (id=1692, pos=716, file-id=3096)
ejurgensen commented 5 years ago

I think I may have found the cause of the CPU issue, but before I try to fix it I need to confirm I am on the right path. So I have added a commit with a bit of extra logging, and I hope you will take that for a spin (with loglevel at debug level). Hopefully you can reproduce the issue again, and then please attach the log file here with at least the part from the last "Got retransmit request from..." and everything thereafter (but please cut out/shorten repetitions).

floogs commented 5 years ago

Thanks for continuing to look into this -- I've managed to reproduce the loop with the modified logging enabled, and attached a shortened log that starts at the beginning of playback of the final song and goes up to the point when the loop begins (with a lot of retransmit requests removed in between).

I also included the full log from the first run of this version just in case it's useful. It's pretty large as it took 8-9 hours of repeating a playlist to trigger the loop this time, although I've also seen it hang up after 5 or 10 minutes of playback.

ejurgensen commented 5 years ago

You are welcome, thanks for helping debugging forked-daapd. The log seems to confirm what I suspected might be a bug that created an infinite loop. I have also found a bug that means that in some circumstances forked-daapd would not retransmit the correct package, which would explain why your device keeps asking for retransmission.

I believe I have fixed both bugs with the commit above, so once again I hope you will rebuild and test. Let me know if it is better - I will be crossing my fingers in the meantime.

PS: You are welcome to delete the log again if you don't want it shared here any more

floogs commented 5 years ago

Well, early results are certainly promising -- I queued up four albums right after building the fixed version, then a couple more later in the evening, then a huge playlist before bed which has run for ~9 hours overnight and counting, all without restarting the service. Playback has been fine the whole time from what I've heard, CPU load still normal, Remote UI is still responsive, etc. Obviously it's hard to say definitively whether we're in the clear or not unless something else goes wrong, but it never would've lasted this long before without issue, so this is very encouraging.

It may be worth noting that there are still near-constant retransmit requests coming from the receiver in the debug log (from eyeballing it, about 4-8 requests every 1-3 seconds). Is that any cause for concern? It doesn't seem to be affecting audio playback from what I can hear.

ejurgensen commented 5 years ago

Sounds good, very happy to hear it is working better.

forked-daapd sends 352 packets each second, so resending a few extra packages each second isn't as such a problem. Audio should also be full quality, that is of course the point of the retransmission.

However, I still think it is worth investigating, because it is unusual. I would at least like to confirm that forked-daapd now sends the correct packages, so it would be great if you could build and test with yet a small logging modification. I have put it in a special branch which is logretransmit1. So you will need to switch to that.

I don't think you need to test for very long, just get like 10 to 20 retransmissions logged, and then paste the lines here (those that say "Got retransmit request..." and say "Retransmitting packet...").

floogs commented 5 years ago

No problem, here you go:

[2019-08-11 14:19:07] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58395 (len 1), last RTP session seqnum 58410 (len 1000)
[2019-08-11 14:19:07] [DEBUG]     raop: Retransmitting packet with seqnum 58395
[2019-08-11 14:19:07] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58396 (len 1), last RTP session seqnum 58410 (len 1000)
[2019-08-11 14:19:07] [DEBUG]     raop: Retransmitting packet with seqnum 58396
[2019-08-11 14:19:07] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58397 (len 1), last RTP session seqnum 58410 (len 1000)
[2019-08-11 14:19:07] [DEBUG]     raop: Retransmitting packet with seqnum 58397
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58716 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58716
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58717 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58717
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58718 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58718
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58719 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58719
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58720 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58720
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58721 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58721
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58722 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58722
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58723 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58723
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58729 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58729
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58730 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58730
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58731 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58731
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58732 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58732
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58733 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58733
[2019-08-11 14:19:10] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58734 (len 1), last RTP session seqnum 58735 (len 1000)
[2019-08-11 14:19:10] [DEBUG]     raop: Retransmitting packet with seqnum 58734
[2019-08-11 14:19:11] [DEBUG]     raop: keep_alive: Sending OPTIONS to 'Sony-STR-DN840'
[2019-08-11 14:19:11] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58879 (len 1), last RTP session seqnum 58898 (len 1000)
[2019-08-11 14:19:11] [DEBUG]     raop: Retransmitting packet with seqnum 58879
[2019-08-11 14:19:11] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58880 (len 1), last RTP session seqnum 58898 (len 1000)
[2019-08-11 14:19:11] [DEBUG]     raop: Retransmitting packet with seqnum 58880
[2019-08-11 14:19:11] [DEBUG]     raop: Got retransmit request from 'Sony-STR-DN840': seqnum 58881 (len 1), last RTP session seqnum 58898 (len 1000)
[2019-08-11 14:19:11] [DEBUG]     raop: Retransmitting packet with seqnum 58881
ejurgensen commented 5 years ago

I would say it looks ok. The packets being sent are correct, and the retransmit requests from the device also seem sensible. The number of retransmit requests is still unusually high, and I can't explain that. But since the errors in forked-daapd appear to be fixed, and playback is working, I will close the issue.