wuub / rxv

Automation Library for Yamaha RX-V473, RX-V573, RX-V673, RX-V773 receivers
Other
111 stars 44 forks source link

Yamaha integration - invalid XML error #87

Open Wiggum127 opened 3 years ago

Wiggum127 commented 3 years ago

I'm not sure you already get this information via Github of HomeAssistant.

So with the risk of sending you something you're already aware of, all details are available here:

https://github.com/home-assistant/core/issues/49387

In any case, hope this helps you in ironing out this bug.

rdgerken commented 3 years ago

I too have this issue. I will be happy to help troubleshoot.

rdgerken commented 3 years ago

Doing some testing tonight... I get this error output when simply switching to the AirPlay input, and then subsequently query which input is active. I thought maybe it seemed like it had something to do with how quickly the commands and queries were being sent to the receiver, but I'm not real sure. I tried putting some sleeps in to see if it would make a difference, but I still get the Invalid XML errors.

>>> import rxv
>>> receivers=rxv.find()
>>> print(receivers)
[<RXV model_name="RX-A3080" zone="Main_Zone" ctrl_url="http://XXX:80/YamahaRemoteControl/ctrl" at 0x7f6f09977c50>]
>>> rx=receivers[0]
>>> rx.on=True
>>> rx.volume
-40.0
>>> rx.inputs()
{'Spotify': 'Spotify', 'MusicCast Link': None, 'USB': 'USB', 'AV4': None, 'AV5': None, 'AV6': None, 'AV7': None, 'AV1': None, 'AV2': None, 'AV3': None, 'TUNER': 'Tuner', 'SERVER': 'SERVER', 'Napster': 'Napster', 'NET RADIO': 'NET_RADIO', 'Alexa': 'Alexa', 'AUDIO4': None, 'AirPlay': 'AirPlay', 'AUDIO2': None, 'AUDIO3': None, 'AUDIO1': None, 'TIDAL': 'TIDAL', 'PHONO': None, 'AUX': None, 'Pandora': 'Pandora', 'Amazon Music': 'Amazon_Music', 'SiriusXM': 'SiriusXM', 'Bluetooth': 'Bluetooth', 'Qobuz': 'Qobuz', 'Deezer': 'Deezer'}
>>> rx.input
'AirPlay'
>>> rx.input="AV1"
>>> rx.input
'AV1'
>>> rx.input="AirPlay"
>>> rx.input
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/rxv/rxv.py", line 254, in input
    response = self._request('GET', request_text)
  File "/usr/local/lib/python2.7/dist-packages/rxv/rxv.py", line 154, in _request
    response = cElementTree.XML(res.content)
  File "/usr/local/lib/python2.7/dist-packages/defusedxml/common.py", line 127, in fromstring
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1671, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1523, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
rdgerken commented 3 years ago

OK, so after sleeping on it - I added "time.sleep(3)" after each of the PUT commands (pun intended - I'm a funny guy, what can I say?) to the receiver when changing the inputs, changing the scenes, and enabling the HDMI outputs. So far, this seems to have band-aided the problem - but I did do full resets on all my gear, so maybe it's a fluke... but it looks promising. I do notice I get an error if I try to query my receiver too quickly after an initial power-up also, so I may need a sleep after a power on command as well. Thing is, you don't want to slow down HA GET queries I've learned, because then you start getting issues in the home assistant logs about it not responding within 10 seconds thing and it overall causes a performance hit... but putting these sleeps on the infrequent PUT commands seems to work, so I'm going to run with it for a bit. Again, still hoping someone that knows what they are doing can come up with a better permanent solution - maybe some kind of error handling/retry logic or something.

Let me know if any of you can reproduce this.

TIA

rdgerken commented 3 years ago

After some run time, this workaround is performing very well, in that I have not had the issue at all while using my modified code. I think what happens is, if the receiver is doing a network function (like switching to the Spotify, AirPlay, Network Radio, etc. - it just can't handle dealing with the 3rd party GET requests if they come in right as the receiver is busy with those network activities. By throttling the code by putting these sleeps in after each PUT command (I ended up putting time.sleep(2) after each PUT command throughout the code), it gives the receiver time to finish it's network "stuff" (if it happens to be doing something like switching the input to AirPlay as an example) before getting hammered with GET commands. I'd still be a fan of putting in some error handling and retry logic as opposed to these arbitrary sleeps (if a more advanced developer could advise on a better fix), but this was easy and seems to do the trick in the interim.

Wiggum127 commented 3 years ago

Is there a way for me to test your changes too? Just to see how my integration behaves.

rdgerken commented 3 years ago

I just replaced the rxv.py file on my docker image with my version (located at "/usr/local/lib/python3.8/site-packages/rxv/rxv.py"), but you could try following these instructions from petro here - and then replace rxv.py with my test version located here. Petro's code snippet is missing the version key in the manifest which will become a problem soon. I think you just need to add the version key so it would read something like this:

{ "domain": "yamaha", "name": "Yamaha Network Receivers", "version": "0.0.1", "documentation": "https://www.home-assistant.io/integrations/yamaha", "requirements": ["rxv==0.7.0rc2"], "codeowners": [], "iot_class": "local_polling" }

Good luck. Please let me know your results.

Thanks, Ryan

FixTheExchange commented 3 years ago

@rdgerken I've been testing this with my Yamaha CX-A5200 for the past few days. Your hunch is close. I think increasing the http_timeout value on line 115 would solve the issue better than adding sleep statements. I haven't tested this yet though. I don't see a retry variable, but some retry logic may be helpful too.

What I've observed is when the Yamaha receiver (or processor in my case) is trying to do certain things, it may take longer to respond. The most reliable way to reproduce the issue is to power on the Yamaha receiver but leave whatever display device you have connected turned OFF. In my case, I have HDMI out to a projector. This causes the Yamaha to periodically attempt to connect to the display device. When it is busy trying to do this, network communication is briefly disrupted. In fact it will drop precisely 4 packets every time. Once this happens, HA errors out and the integration stops responding until HA is restarted. If this condition occurs while HA is starting, it will take a long time to start, and will start with the integration in a failed state.

FixTheExchange commented 3 years ago

Also noticing that my Yamaha CX-A5200 will periodically drop 3 packets when in a standby state (network standby is enabled) for the first few minutes after it enters that state. After a few minutes, it no longer drops any packets while in standby. I did not observe any packet loss when both the CXA-5200 AND the display device connected to it were powered on. If the display device is powered on BEFORE the the CX-A5200, there was no observed packet loss when transitioning from off (standby) to on.

rdgerken commented 3 years ago

I added a general exception handler at line 194, and it keeps HA connected when there are communications problems. I'm far from being out of the woods on this problem though. I still get very poor performance with this integration. I almost feel like the problem is on the Yamaha side, but as soon as I say that, their app seems to work flawlessly when I'm not trying to connect to it with HA... so I don't know. I would love to have a breakthrough on this soon.

I did see in the "def net_radio" (Line 860ish) there is some wait/retry logic as they navigate the menus. I wonder if we need something more like this than the arbitrary sleeps that I put in.

Here's what I put in at 194 - which seems to keep my HA connected through these communication problems.

except Exception: logger.exception("General Exception Caught in _request") raise

FixTheExchange commented 3 years ago

The Yamaha AV Controller app for Android (version 5.5) occasionally has issues connecting too, but will generally start working again after a few retries, where HA will only start working if it is restarted.

If you send a bunch of GET requests to "http:///YamahaRemoteControl/desc.xml" either through your browser, or using cURL, you'll notice that sometimes the response takes up to 4 seconds, but usually it's a few milliseconds. Occasionally it will hang and not respond for anywhere from 15 to 60 seconds. When this happens you won't get the XML data unless you try again. If you have a continuous ping running, you'll see this is when the 4 packets in a row are dropped that I described previously. Subsequent attempts will be back to millisecond response time.

If you have one of the newer MusicCast models, you can do the same test with a GET request to "http:////YamahaExtendedControl/v1/main/getStatus" and you'll see similar results (the MusicCast integration suffers from a similar issue, but seems to be slightly more resilient). One key difference when hitting the MusicCast API is you will get a different response code in the JSON it returns instead of an incomplete XML file (or no file at all depending on timing). That response code is 1, which according to this API documentation I found means "Initializing".

This tells me the problem is definitely on the Yamaha side, as it appears something causes the network stack to re-initialize periodically. It is a transient condition, so it should be possible to work around it.

The "net_radio" section looks like it just makes 20 attempts and sleeps for 1 second between attempts if they fail. Adding something like that to the "_discover_features" section that starts on line 134, and "_request" section that starts on line 169 would probably fix things. There may be a better way, but I'm not sure. I can read Python pretty well, but I'm not that great at writing it.

rdgerken commented 3 years ago

I thought I'd put an update on this to anyone watching this issue. I followed "FixTheExchange's" advice, and put some additional error handling and retry logic in on the discover and request methods. This seemed to do what was intended (as in the retry works like it should, and HA doesn't lock up and require reboots anymore), but everything was still running very poorly. It is running poorly because the Yamaha is truly misbehaving, and so it's not providing timely updates to HA, and it's hitting my retries, sometimes needing to retry the same request 8,9,10 times before "working." Running a continuous ping to my receiver, and I see frequent timeouts - and just overall horrible communications. So today, I decided to switch from wired to wireless connection - and everything seems to work perfectly. The additional code for error handling and retrying isn't even getting hit. So, I don't know if there is something wrong with my wired gear, or something on the wired NIC on the Yamaha - but I just do not get good performance on the wired connection - which is exactly opposite of what I would normally expect. Running continuous pings on my wireless, I do see a bit more latency between pings (as you would expect) - but I do not get the frequent all of a sudden timeouts like I was getting on the wired connection. I have another network device on the same switch as my Yamaha, and running continuous pings to it is just fine -so I suspect I have something going on with my Yamaha wired NIC. I just upgraded to the 195 firmware last night, again, just hoping something would help - but I saw similar, if not slightly worse, performance.

Now, before I get excited, it is possible that switching the receiver from wired to wireless resets something on there, and maybe that's why it's working "good" on wireless. I did notice when I was having a lot of issues on the wired connection, that if I unplugged power for a bit, it always seemed to work much better for a short time. So, I think I'll need to keep an eye on this for a while and see how it does over time before I can draw any conclusions. If this ends up working, I'll try a different wired switch or something to see if I can single out if I have a problem on my network or if it is the NIC.

For any of you others having this problem - are you having these issues on wired / wireless / or both? I would say if any of you are having issues with this on wireless, then I'm guessing it's only a matter of time before I'll see the problem come back. If you are wired, would you mind switching to wireless and see if you get the same results I have?

If anyone is curious, here are the two code blocks I modified. It looks like HA 2021.7 is running python 3.9 now - so make sure if you patch rxv.py, you are doing it in the python3.9/site-packages/rxv as opposed to the 3.8 folder I mentioned back in May. If you don't use all of my code, that's fine... but do pay special attention to line 188. The repo code is raising an exception called "ResponseException" that isn't being caught - which I think is causing folks to have to reboot HA to get recovered - so by putting this except handler in there (and also a general Exception handler just to cover all the bases), HA logs the error and doesn't bomb out to the point where you have to restart it. (At least that's what I think is going on... but I could be wrong.) I can't remember what version of rxv HA is shipping with, and if this is a problem or not. I've been running the latest repo version for a while now.

To be clear, here is my environment:

Thanks - Ryan

def _discover_features(self):
    """Pull and parse the desc.xml so we can query it later."""

    #Do some retries when Yamaha is not cooperating
    tries = 20
    for i in range(tries):
        try:
            desc_xml = self._session.get(
                self.unit_desc_url, timeout=self.http_timeout
            ).content
            if not desc_xml:
                if i < tries - 1:
                    logger.error("desc_xml not retrieved in _discover_features - Sleeping before retry {:n}".format(i))
                    time.sleep(1)
                    continue
                else:                    
                    logger.error(
                        "Unsupported Yamaha device? Failed to fetch {}".format(
                            self.unit_desc_url
                        ))
                    return
            self._desc_xml = cElementTree.fromstring(desc_xml)
        except xml.etree.ElementTree.ParseError:
            if i < tries - 1:
                logger.error("ParseError in _discover_features - Sleeping before retry {:n}".format(i))
                time.sleep(1)
                continue
            else:
                logger.exception("Invalid XML returned for request %s: %s",
                                self.unit_desc_url, desc_xml)
                raise
        except Exception:
            if i < tries - 1:
                logger.error("General Exception in _discover_features - Sleeping before retry {:n}".format(i))
                time.sleep(1)
                continue
            else:
                logger.exception("Failed to fetch %s" % self.unit_desc_url)
                raise
        break

def _request(self, command, request_text, zone_cmd=True):
    if zone_cmd:
        payload = Zone.format(request_text=request_text, zone=self._zone)
    else:
        payload = request_text

    request_text = YamahaCommand.format(command=command, payload=payload)

    #Do some retries when Yamaha is not cooperating
    tries = 15
    for i in range(tries):
        try:
            res = self._session.post(
                self.ctrl_url,
                data=request_text,
                headers={"Content-Type": "text/xml"},
                timeout=self.http_timeout
            )
            # releases connection to the pool
            response = cElementTree.XML(res.content)
            if response.get("RC") != "0":
                raise ResponseException(request_text, res.content)
            return response
        except xml.etree.ElementTree.ParseError:
            if i < tries - 1:
                logger.error("ParseError in _request - Sleeping before retry {:n}".format(i))
                time.sleep(1)
                continue
            else:
                logger.exception("Invalid XML returned for request %s: %s", request_text, res.content)
                raise
        except ResponseException:
            if i < tries - 1:
                logger.error("ResponseException in _request - Sleeping before retry {:n}".format(i))
                time.sleep(1)
                continue
            else:
                logger.exception("Request %s failed with %s in _request", request_text, res.content)
                raise
        except Exception:
            if i < tries - 1:
                logger.error("General Exception in _request - Sleeping before retry {:n}".format(i))
                time.sleep(1)
                continue
            else:
                logger.exception("General Exception Caught in _request")
                raise
        break