home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.49k stars 30.71k forks source link

yamaha_musiccast integration crashing musiccast v2 devices after a number of days. #79032

Closed benjycov closed 2 years ago

benjycov commented 2 years ago

The problem

I have a lot of Yamaha Musiccast devices (v1 and v2) and after a recent Musiccast firmware update (not HA) the v2 devices started crashing after a number of days, they would then require a full power cycle to restore them.

By crash I mean musiccast device in question will still respond to ping but stops listening on the musiccast port.

I've engaged with Yamaha support and once they realised that I am using HA they then tried to wash their hands of it and declare that their API isn't public and only their app is supported. Sigh.

I don't believe this is specifically a HA issue and that any 3rd party platform using the API would eventually crash the Musiccast device (incuding Yamahas own app).

As I have a lot of musiccast devices (effectively one device in every room), there are devices that never get used, and they still crash, thus I am theorising that the status polling that the HA integration does once per minute to each device is causing the problem, I suspect there is some sort of memory leak on the Musiccast side, I can see this polling happening in the HA logs here:

2022-09-24 17:09:19.284 DEBUG (MainThread) [homeassistant.components.yamaha_musiccast] Finished fetching yamaha_musiccast data in 0.282 seconds (success: True)

Please can the devs of this integration tell me what API calls are made to the Musiccast API once per minute? (apologies I can't find it myself in the source).

I am then going to write a bit of script that rapidly makes the same API call(s) to see if I can reliably crash one of my musiccast device and take that back to Yamaha support.

What version of Home Assistant Core has the issue?

2022.9.6

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

yamaha_musiccast

Link to integration documentation on our website

https://www.home-assistant.io/integrations/yamaha_musiccast/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

homeassistant commented 2 years ago

yamaha_musiccast documentation yamaha_musiccast source (message by IssueLinks)


Hey there @vigonotion, @micha91, mind taking a look at this issue as it has been labeled with an integration (yamaha_musiccast) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)

benjycov commented 2 years ago

My guess would be that HA is just calling:

http://a.b.c.d/YamahaExtendedControl/v1/main/getStatus

Every minute, anything else?

vigonotion commented 2 years ago

We are calling fetch regularily: https://github.com/home-assistant/core/blob/a3cb38e11449c806a693202cc7f4540450d097d5/homeassistant/components/yamaha_musiccast/__init__.py#L117

Fetch does multiple things, depending on the features of the device. You can find the code in the library here: https://github.com/vigonotion/aiomusiccast/blob/main/aiomusiccast/musiccast_device.py#L372 It's also fetching netusb status for example.

For your script, I would also recommend you to use the library, you can connect to a musiccast device the same way as home assistant would and then call the fetch method.

I can also recommend using mitmproxy to inspect network traffic from HA to the Musiccast device and also from the app to the musiccast device.

benjycov commented 2 years ago

Hi Tom,

Thanks for info, most helpful. I was in the process of setting MITM up and just pondering the best way to intercept the traffic when it occurred to me that maybe I could just see what I wanted on a simple tshark capture on my router, and sure enough I could. For reference here is what a fetch translates to in actual calls to the Musiccast device (in my case an Yamaha xda-qs5400rk):

HTTP: GET /YamahaExtendedControl/v1/system/getNameText?id=None HTTP: GET /YamahaExtendedControl/v1/netusb/getPlayInfo HTTP: GET /YamahaExtendedControl/v1/netusb/getPresetInfo HTTP: GET /YamahaExtendedControl/v1/dist/getDistributionInfo HTTP: GET /YamahaExtendedControl/v1/main/getStatus HTTP: GET /YamahaExtendedControl/v1/system/getFuncStatus

Thanks again. I will follow-up if and when I figure out what is breaking musiccast.

vigonotion commented 2 years ago

The getNameText?id=None looks strange, can you see the response? Maybe this causes the crashes

benjycov commented 2 years ago

I agree the ?id=None looks odd but the device responds the same anyway (basically a list of zones and sources for each zone).

I have now run over 150,000 cycles of these HTTP Gets (so effectively replicating over 100 days of HA at once per minute) against two different v2 musiccast devices (XDA-QS5400RK & RX-S600) and neither of them broke - normally they fail after 4-5 days. So that blows a hole in my theory but at least rules out the HA integration and the underlying aiomusiccast library.

Unless there is any other interaction the integration has when the device isn't actually in use?

micha91 commented 2 years ago

First of all thanks @benjycov for reporting this issue - I own some of the v2 devices, but did not perform the firmware upgrade yet. As @vigonotion already said, there are multiple calls being made once a minute, but the list of calls you collected looks correct. The getNameText call is indeed not completely correct - we should delete the query parameter to conform to the API documentation. I will try to fix that in the aiomusiccast asap, but I am pretty busy at the moment and if this does not resolve the issue, I would not see this as an urgent fix. In addition to the active pulls once a minute, we establish a UDP connection to the device. The device sends a message via this UDP connection whenever something changes and we fetch the current state after that via the API. Maybe the device is not able to keep the same connection for such a long time anymore? If you could open a port to receive the UDP messages and send the headers accordingly with every HTTP call you could also simulate that behavior. The documentation about this UDP connection is available here in chapter 11 and we implemented that here. If you are able to reproduce the error this way, we could simply change the port once a day or once an hour or something like that. If you are not able to reproduce the error, I will update one of my devices and will start to debug it. But I don't think that I am able to start with that before mid of October - sorry!

benjycov commented 2 years ago

No problem at all. First of all to say I'm still not 100% convinced that this a problem with anything in our domain here, however I am sure that if the problem was more widespread Yamaha would be receiving more tickets themselves and dealing with it themselves.

For reference I have 10 v2 devices spread across 3 model types, for reference they are on API version: 2.13 and netmodule version: 1412. Of the ten devices 8 of them are 2xXDA-QS5400RK, each physical devices having 4 netmodules that share one ethernet interface and then seem to be separate devices from a musiccast perspective - so for example I can have a situation where all 4 'sub' devices are pinging but only 2 are responding to musiccast.

After doing a couple of days of testing the API calls, nothing had crashed then suddenly two of the sub devices on one of the XDMs stopped responding on musiccast (they happened to be ones I didn't do any testing against), however it did seem to coincide with a reboot of HA.

So I had a look at the API calls that occur when HA is rebooted and its a more extensive range:

HTTP: GET /YamahaExtendedControl/v1/system/getNetworkStatus HTTP: GET /YamahaExtendedControl/v1/system/getDeviceInfo HTTP: GET /YamahaExtendedControl/v1/system/getNameText?id=None HTTP: GET /YamahaExtendedControl/v1/system/getFeatures HTTP: GET /YamahaExtendedControl/v1/netusb/getPlayInfo HTTP: GET /YamahaExtendedControl/v1/netusb/getPresetInfo HTTP: GET /YamahaExtendedControl/v1/dist/getDistributionInfo HTTP: GET /YamahaExtendedControl/v1/main/getStatus HTTP: GET /YamahaExtendedControl/v1/system/getFuncStatus HTTP: GET /YamahaExtendedControl/v1/system/getDeviceInfo

I scripted this and hit one of my devices over 100,000 times and still it didn't crash. Is there anything else happening at startup?

I will look at the UDP and populate the HTTP headers and test that.

Ultimately this is something wrong with the Yamaha firmware, no matter what we do it shouldn't crash the device requiring a power cycle, so either we can identify what is breaking it and get Yamaha to fix or work around it.

Thanks for your help.