masmu / pulseaudio-dlna

A lightweight streaming server which brings DLNA / UPNP and Chromecast support to PulseAudio and Linux
GNU General Public License v3.0
1.25k stars 162 forks source link

Onkyo TX-8050: Device failed to play #133

Closed jlehtnie closed 8 years ago

jlehtnie commented 8 years ago

Streaming from pulseaudio-dlna to TX-8050 fails. Play fails with error code 716: Resource not found. Debug log attached.

Onkyo-tx-8050.txt

masmu commented 8 years ago

Hmm, I looked into your debug.log and there is one odd thing. Your device seems to start streaming just after registering the url which to play. Normally it should wait until it gets instructed to start playing. This is a 2 step process. Set the url and instruct it to play what was set. But yours seem to start right after setting. Could you give me a second debug.log so that I can compare it with the first one?

jlehtnie commented 8 years ago

Yes. I can confirm that this happens everytime: GET request to the stream URL is always received from the device before pulseaudio-dlna sends the PLAY command. Another debug log attached.

Onkyo-tx-8050.2.txt

masmu commented 8 years ago

Well... there are several things we can try...

class CoinedUpnpMediaRenderer(
        pulseaudio_dlna.plugins.renderer.CoinedBaseRendererMixin, UpnpMediaRenderer):

    def play(self, url=None, codec=None, artist=None, title=None, thumb=None):
        try:
            stream_url = url or self.get_stream_url()
            return_code = UpnpMediaRenderer.register(
                self, stream_url, codec,
                artist=artist, title=title, thumb=thumb)
            return return_code
        except requests.exceptions.ConnectionError:
            logger.error('The device refused the connection!')
            return 404
        except pulseaudio_dlna.plugins.renderer.NoSuitableEncoderFoundException:
            logger.error('Could not find a suitable encoder!')
            return 500

(The original code shows the 2 step process, for testing purposes we leave the play action out.)

jlehtnie commented 8 years ago

Thank you very much for your help! I was trying the things you suggested. I did not get it working yet, but I have got much more information.

First of all using --disable-switchback flag did not seem to make any difference. But when I disabled the play action in the source code as you suggested, I did not receive http get for the stream at all anymore. Thus it seems that sending the register url does not trigger the streaming after all.

Then I was adding some more detailed logging to UpnpMediaRenderer.play() and took Wireshark capture[1] of the communication. I learned that the device does not send any kind of reply to the play request! Or should I say that it replies immediately with get stream request. On the pulseaudio-dlna the play-request is pending for response but meanwhile the device is waiting for the stream download to start. Thus they are waiting for each other until after 5 seconds the device sends HTTP 500 (716: resource not found). This is probably some sort of timeout for the stream download.

I also tried setting the play request timeout to one millisecond and ignoring the timeout exception. But it still seems that starting up the stream takes longer than the device is willing to wait. I will continue to debug this. Naturally any ideas how to continue are warmly welcome!

[1] https://jil.kapsi.fi/onkyo-tx8050-capture.pcapng

masmu commented 8 years ago

I am getting the feeling that your device works correct. In your capture you can see that your computer accepts the request and receives the GET header but never sends data back to the device. I studied the code of the streamserver.py again and I recognized that there is an important log missing: Starting processes "{recorder} | {encoder}" (line 305) So, I am pretty sure that is why no data was sent. For some reason the encoder was not launched... So, I would focus on that module. The method which is called on every GET request is:

StreamRequestHandler.do_GET(self) # line 423

Please undo my suggested code changes and find out why line 423 never reaches line 305.

Btw, excellent wireshark capture!

jlehtnie commented 8 years ago

Thanks for the hints again. There must have been some another issue in the previous debug log too because I will get your mentioned Starting processes ... trace now. But I also noticed that there are very suspicious traces like this:

INFO Could not get socket IP and Port. Setting to unknown.

And it seems that in the streamserver.communicate() socket.send_data() fails because the client address has not been set for some reason. I will attach the new debug log.

Onkyo-tx-8050.3.txt

masmu commented 8 years ago

Hmm, something is taking really long. See here:

12-05 17:42:28 pulseaudio_dlna.streamserver                   DEBUG    Got the following GET request
12-05 17:42:33 pulseaudio_dlna.streamserver                   DEBUG    Starting processes "parec --format=s16le -d tx8050_dlna.monitor | lame -b 192 -r -"

We receive the request at 17:42:28 but the encoder is started at 17:42:33. In those 5 seconds there is no data sent throught the socket which i assume makes the device disconnect.

12-05 17:42:33 pulseaudio_dlna.streamserver                   INFO     Client unknown unregistered stream /dWRuPSJ1dWlkOjcyMGZhZTlhLTE3YmQtNDliMi04ODEwLTJlNTg0NmZjODM5YyIsdHlwZT0iYnJpZGdlIg%3D%3D/stream.mp3 using method 2.

(method 2 happens when the device closed the connection from its end) And i guess your hint about the error could be an explanation for this. Maybe the peername detection is taking too long.

Please try to modify the streamserver.py's RemoteDevice class like that:

# line 56
@functools.total_ordering
class RemoteDevice(object):
    def __init__(self, bridge, sock):
        self.bridge = bridge
        self.sock = sock
        try:
            self.ip, self.port = ('foo', 'bar')
        except:
            logger.info('Could not get socket IP and Port. Setting to '
                        'unknown.')
            self.ip = 'unknown'
            self.port = 'unknown'

    def __eq__(self, other):
        if isinstance(other, RemoteDevice):
            return self.ip == other.ip
        raise NotImplementedError

    def __gt__(self, other):
        if isinstance(other, RemoteDevice):
            return self.ip > other.ip
        raise NotImplementedError

    def __str__(self):
        return '<{} socket="{}" ip="{}" port="{}">'.format(
            self.__class__.__name__,
            str(self.sock),
            self.ip,
            self.port,
        )

This will break the application for multiple devices, but good enough to see if that is the real problem.

jlehtnie commented 8 years ago

I tried your suggested peername hardcoding. I'm not sure if it had any effect at all. I still see this roughly 5 sec delay from the GET request to the streaming start and then the socket disconnects.

However, while playing with this I collected the following traces where the behaviour is otherwise little bit different than before. Namely, we can two stream packets in the capture[1] before the socket disconnects. Also, the device now retries sending the GET request 3 times.

[1] https://jil.kapsi.fi/onkyo-tx8050-capture-peername-foobar.pcapng

onkyo-tx8050-debug-log-peername-foobar.txt

masmu commented 8 years ago

Any progress here?

jlehtniemi-broadsoft commented 8 years ago

No, not really. It has been on-hold on my part. I will try to find some time to continue debugging it soonish.

memfail commented 8 years ago

I have the same problem using an Onkyo TX-609. When I try to connect to my Panasonic-TV Pulseaudio works great.

shiznix commented 8 years ago

Me too, same problem using an Onkyo TX-NR616. Audio plays for roughly 5 seconds then immediately disconnects with the connection timeout "failed to play! (408)" in pulseaudio-dlna.

Have to keep turning the Samsung TV on and streaming pulseaudio-dlna through that.

masmu commented 8 years ago

I may have found the issue. For the Samsung devices and for the Onkyo ones. Please try out that branch. You need to install ffmpeg or avconv for this.

shiznix commented 8 years ago

Testing the new ffmpeg branch doesn't fix this Onkyo problem and now breaks ability to stream to Samsung TV.

I have attached debug logs using the new branch for both attempts to play through Onkyo AVR and Samsung TV.

Apologies in advance for the spam in the logs from a 'Technicolor TG 797n' device, I can't turn this device off.

Thanks.

SamsungTV_ffmpeg-backend.log.txt OnkyoTX-NR616_ffmpeg-backend.log.txt

masmu commented 8 years ago

The Onkyo device failed because of a HTTP timeout. This is weird, because there was a successful request to the same URL before.

Have you tried other encoders, especially for the Samsung TV?

shiznix commented 8 years ago

Yes, the lame encoder method as currently used in master branch works fine with the Samsung TV.

So to summarise.... Master Branch (lame):

FFmpeg feature branch:

shiznix commented 8 years ago

Well here's some positive news. All streaming works (both OnkyoAVR and SamsungTV) if I roll back to version 0.4.4 Onkyo streaming breaks as of the 0.4.5 release.

So there seems to be an errant commit between the 0.4.4 and 0.4.5 releases. Break occurs in commit 314996fa6564fa4c58fdce3a0e95418a8fd40a72

masmu commented 8 years ago

Samsung: Whats about other ffmpeg codecs besides mp3 for the Samsung TV?

Just to make sure is really the encoder (lame / ffmpeg) alone and no other change I created a branch where I reverted back to lame. Please check that out.

Onkyo: Good job identifying the breaking commit. Since there are not that many changes, may I ask you to revert the changes step by step to identify the exact problem? It might be the implementation of the

masmu commented 8 years ago

What you could try first is a that new branch. It just uses ffmpeg but code complexity was minimized a lot and I am pretty sure there are no dead locks in it. E.g. all the threading changes stuff was wiped from this.

jlehtnie commented 8 years ago

I tested the ffmpeg branch with Onkyo TX-8050 but no luck.

Wireshark capture: https://jil.kapsi.fi/onkyo-tx8050-ffmpeg-backend.pcapng Debug log: onkyo-tx8050-ffmpeg-backend.txt

I also tested the old version 0.4.4 as suggested @shiznix but it seems to make no difference with my equipment.

masmu commented 8 years ago

Thanks, could you please also test the refactor branch I suggested to test first?

shiznix commented 8 years ago

I've tested both branches, here are the results: \ refactor/separate-encoder-streams ** SamsungTV - fails with unsupported file format message on TV OnkyoAVR - strange behaviour. Onkyo fails with the usual 408, but if I retry immediately after the failure then it succeeds in playing but with some python errrors. Debug log attached was created by pasting terminal output as 'pulseaudio-dlna --debug' output seems to strip stderr so python errors are lost.

OnkyoTX-NR616+refactor_separate-encoder-streams.log.txt Note that at the 'UnboundLocalError: ...' point in that debug.log the streaming is working well for some several minutes, then it eventually falls through to the 408 failure.

\ testing/samsung-tv ** SamsungTV - works OnkyoAVR - fails with same 408 error as before

So it would seem the SamsungTV is happy with the MP3 created by lame, but not with the MP3 created by ffmpeg.

I did try and revert the changes in the breaking commit 314996fa6564fa4c58fdce3a0e95418a8fd40a72 but there were a few lines referenced in the commit that have since changed leading up to current master branch so it couldn't be done cleanly. I just left these lines and continued forward reverting what lines I could but the runtime test result was still failing. The last commit where streaming works for both Onkyo and Samsung is commit e556a5e97eb08fcc9e1830d9c0e35eb9ee1a91be on Sep 04.

shiznix commented 8 years ago

@jlehtnie You might succeed in testing with other lower versions. 0.4.4 worked for me but my error is a '408' timeout whereas yours is a '716' resource not found.

masmu commented 8 years ago

@jlehtnie

Error 716 means that there was something wrong with either the provided stream url or the provided metadata, despite your device tries to play the stream. In your logs there is still a 5 seconds delay between recieving the request and the start of the encoder. In that time the device will not recieve any data and I guess that it therefore complains about not finding a valid ressource. There might be an issue with my implementation of the streamserver, especially because of the Could not get socket IP and Port. Setting to unknown. log. Thats why I created a much more simplified version of it in the refactor/separate-encoder-streams branch.

I googled a bit around and there are plenty of complains that Onkyo recievers fail with that error. The most often suggestions are:

Furthermore if you own an Android phone, could you test if BubbleUPnP (maybe also availible for iPhone) is able to stream to the device? In case it is a dump would be very helpful.

Could you also try other codecs? Especially wav and flac.

masmu commented 8 years ago

@shiznix

I fixed the errors in the branch.

Instructing a DLNA device to start streaming normally consists of 2 steps:

Most of my devices (and due to the reports I've gotten) most other devices require both steps. But the Onkyo devices (Onkyo TX-NR616 and Onkyo TX-8050) launched their requests directly after the registering command. Thats why I inserted a polling request to check the devices state (get_protocol_state) and see if it is necessary to actually send the play command. That works great in my tests. The python exception occured right after the register command, during polling the devices state and therefore prevented the play command from being sent. I guess that was the case where your device started to play. But pulseaudio-dlna's internal state for the device remained idle (not playing) and after receiving a signal about a pulseaudio stream change pulseaudio-dlna tried to instruct the device to play again.

The real problem with your device is, that there are those time out errors. I've seen play and _get_protocolstate requests in your logs which sometimes work and which sometimes fail. This is really odd, since those are always the same requests. I could understand if they would always fail, but there seems to be no other explanation than network issues or device failure to me. I've experience such a behaviour with some devices sitting behind an unreliable wifi extender.

How is the device connected? Ethernet? USB-Wifi-Stick? Is the device running the latest firmware?

I remember a guy with an Onkyo reciever asking for adding the --request-timeout flag (Issue 102) stating out his reciever is kind of slow. And right after the the breaking commit there is the commit 314996f which inserts requests timeouts. I am getting the feeling that yours might be too. So, lets try starting pulseaudio-dlna with the --request-timeout option. Try setting it to 30 or more seconds.

So, what we know:

jlehtnie commented 8 years ago

Thanks for all the new suggestions!

shiznix commented 8 years ago

Brilliant! '--request-timeout' was the magic I needed.

Back on 'master' branch and can confirm that running 'pulseaudio-dlna' still causes 408 connection errors with Onkyo. But running as low as 'pulseaudio-dlna --request-timeout=15' reliably connects with no error.

Onkyo is connected to network via ethernet. Thanks for your patience and a solution @masmu :)

masmu commented 8 years ago

@jlehtnie Sorry for the delay.

I am still focused on the 5 second delay in your logs. Therefore I created a branch testing/onkyo-tx-8050 which logs exactly what when happens.

Could you try it out? Hopefully we find what is taking those 5 seconds...

jlehtnie commented 8 years ago

No problem @masmu, here are some logs using that branch and a couple of different codecs.

onkyo-tx8050-testing-enc-wav.txt onkyo-tx8050-testing-enc-mp3.txt onkyo-tx8050-testing-enc-l16.txt

masmu commented 8 years ago

@jlehtnie I've disabled the line which took 5 seconds and updated the branch. Could you please retry?

jlehtnie commented 8 years ago

Yes! I confirm that it works now, well done @masmu. Thank you very much!

masmu commented 8 years ago

@jlehtnie You are welcome. It was the logging line in

    def log_message(self, format, *args):
        args = [unicode(arg) for arg in args]
        logger.info('Got request from {host} - {args}' .format(
            host=self.address_string(),
            time=self.log_date_time_string(),
            args=','.join(args)))

I guess it is the self.address_string() call. This could be a network configuration issue on your side. Please checkout current master. I removed that log there as well.

jlehtnie commented 8 years ago

Also the master works OK now!

This seems to be a known issue with python socket.gethostbyaddr(). I found a lot of cries about similar problems by googling "python gethostbyaddr timeout".

It does not seem to be a network configuration issue. host, dig, and nslookup manage to resolve the reverse lookup in a snap, e.g.:

$ time host 192.168.10.41
Host 41.10.168.192.in-addr.arpa. not found: 3(NXDOMAIN)
...
real    0m0.015s
...

But

$ time python -c 'import socket; socket.gethostbyaddr("192.168.10.41")'
...
socket.herror: [Errno 1] Unknown host
...
real    0m5.072s
...
masmu commented 8 years ago

Well, there are your 5 seconds :wink: ... pretty exactly. That call was blocking the thread the whole time and the encoder was not started during that. And your device disconnected after a specific period of time not receiving any data.

Could you give it a quick test for the devices codecs? Like to add them to the list of tested devices.

masmu commented 8 years ago

Maybe a DNS problem?

jlehtnie commented 8 years ago

Ok. I have tested the codecs.

codec status
wav, l16 streaming ok with good quality
mp3 streaming but quality is poor, lots of breaks and cuts etc.
flac, ogg, opus, and aac not supported by the device

About the DNS: yes, you can say that it is a DNS problem in a way. Namely that there is no DNS server for this private IP network. But that is a pretty normal case in a home LAN. To me the real problem is that socket.gethostbyaddr() uses such an absurdly long default timeout and does not even allow user to lower it.

But I guess that bug report should be filed to cpython and not pulseaudio-dlna :)

Thank you once more for solving the issue!

masmu commented 8 years ago

Thank you. I am glad that I could help.

Regarding timeouts: I recently increased the default HTTP request timeout to 15 seconds. At first it was about 5 seconds to also be safe in slow networks. But some manufacturers HTTP server responses are implemented blocking, so that the request idles until the requested information is available. And due to this I had to increase the timeout to an absurd value to make it work. Maybe there is a similar story behind that gethostbyaddr() implementation.

In case your device is connected via cable there is no better option than wav and l16. If you prefer other codecs for some reason you can try the --encoder-backend option. Maybe ffmpeg will do a better job than lame.

masmu commented 8 years ago