home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.43k stars 30.67k forks source link

Enphase Envoy contant disconnect from Envoy R #126162

Closed luis-castro closed 4 weeks ago

luis-castro commented 1 month ago

The problem

My Envoy Enphase integration is configured with 2 gateways: 1 is running firmware D3.18.10 (Envoy R), and the other is running firmware 7.3.617 (Envoy IQ).

The integration is constantly losing data from the inverters connected to the Envoy R, showing them as Unavailable.

This is sometimes solved by restarting homeassist, others I have to delete the integration and add it again.

What version of Home Assistant Core has the issue?

core-2024.8.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

enphase_envoy

Link to integration documentation on our website

https://www.home-assistant.io/integrations/enphase_envoy

Diagnostics information

config_entry-enphase_envoy-01J815RY1A02ECBF94NKK8NKNV (1).json

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-09-17 17:48:30.551 ERROR (MainThread) [homeassistant.components.enphase_envoy.coordinator] Unexpected error fetching Envoy [*REDACTED S/N*] data
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 238, in _receive_event
    raise RemoteProtocolError(msg)
httpcore.RemoteProtocolError: Server disconnected without sending a response.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 354, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/enphase_envoy/coordinator.py", line 152, in _async_update_data
    envoy_data = await envoy.update()
                 ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyenphase/envoy.py", line 443, in update
    await updater.update(data)
  File "/usr/local/lib/python3.12/site-packages/pyenphase/updaters/api_v1_production_inverters.py", line 40, in update
    inverters_data: list[dict[str, Any]] = await self._json_request(
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyenphase/updaters/base.py", line 34, in _json_request
    response = await self._request(end_point)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyenphase/envoy.py", line 384, in _make_cached_request
    response = await request_func(endpoint)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyenphase/envoy.py", line 216, in request
    return await self._request(endpoint, data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyenphase/envoy.py", line 251, in _request
    response = await self._client.get(
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1801, in get
    return await self.request(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1574, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1661, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1763, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 372, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.RemoteProtocolError: Server disconnected without sending a response.

Additional information

No response

home-assistant[bot] commented 1 month ago

Hey there @bdraco, @cgarwood, @joostlek, @catsmanac, mind taking a look at this issue as it has been labeled with an integration (enphase_envoy) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `enphase_envoy` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign enphase_envoy` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


enphase_envoy documentation enphase_envoy source (message by IssueLinks)

luis-castro commented 1 month ago

When I enabled DEBUG logs, it seems the integration is wrongly trying to use /production.json instead of the correct API /api/v1/production and therefore it is unable to decode the output (as expected, becuase the Envoy is returning an error).

If the integration has detected we are running firmware D3.18.10, then it should only make use of the following APIs:

2024-09-17 18:06:14.575 DEBUG (MainThread) [pyenphase.firmware] Request reply in 0.1 sec from http://192.168.1.15/info status 200: text/xml b"<?xml version='1.0' encoding='UTF-8'?>\n<envoy_info>\n  <time>1726617974</time>\n  <device>\n    <sn>[*REDACTED*]</sn>\n    <pn>800-00240-r03</pn>\n    <software>D3.18.10</software>\n    <euaid>4c8675</euaid>\n    <seqnum>0</seqnum>\n    <apiver>1</apiver>\n  </device>\n  <package name='backbone'>\n    <pn>500-00010-r01</pn>\n    <version>02.01.15</version>\n    <build>ad8746</build>\n  </package>\n  <package name='devimg'>\n    <pn>500-00004-r01</pn>\n    <version>01.02.245</version>\n    <build>033b78</build>\n  </package>\n  <package name='kernel'>\n    <pn>500-00003-r01</pn>\n    <version>02.04.00</version>\n    <build>80e63f</build>\n  </package>\n  <package name='security'>\n    <pn>500-00016-r01</pn>\n    <version>02.00.00</version>\n    <build>54a6dc</build>\n  </package>\n  <package name='boot'>\n    <pn>590-00015-r01</pn>\n    <version>02.00.01</version>\n    <build>0b54c5</build>\n  </package>\n  <package name='geo'>\n    <pn>500-00008-r01</pn>\n    <version>02.01.22</version>\n    <build>702db9</build>\n  </package>\n  <package name='app'>\n    <pn>500-00002-r01</pn>\n    <version>03.18.10</version>\n    <build>f0855e</build>\n  </package>\n  <package name='rootfs'>\n    <pn>500-00001-r01</pn>\n    <version>01.02.00</version>\n    <build>807</build>\n  </package>\n</envoy_info>\n"
2024-09-17 18:06:14.576 DEBUG (MainThread) [pyenphase.envoy] FW: 3.18.10, Authenticating to Envoy using envoy/installer authentication
2024-09-17 18:06:14.577 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/ivp/meters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:14.705 DEBUG (MainThread) [pyenphase.updaters.meters] Skipping meters endpoint as user does not have access to /ivp/meters: Authentication failed for http://192.168.1.15/ivp/meters with status 401, please check your username/password or token.
2024-09-17 18:06:14.706 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/production.json?details=1 with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:14.933 DEBUG (MainThread) [pyenphase.envoy] Request reply in 0.2 sec from http://192.168.1.15/production.json?details=1 status 200: application/json b'    <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">\n    <html>\n    <!-- START HEAD CONTENT -->\n    <head>\n      <!-- include style sheet -->\n      <link rel="StyleSheet" type="text/css" href="/include/style.css" />\n      <!-- include JS libraries -->\n    </head>\n    <!-- END HEAD CONTENT -->\n    <body>\n    <center>\n    <img src="/images/emu-small.gif" style="margin:10px" alt="Emu"/>\n    <h2>Page Not Found</h2>\n    <p>The page you tried to view does not exist\n     Try the <a href="/">home</a> page</p>\n    </center>\n    </body>\n    </html>\n'
2024-09-17 18:06:14.934 DEBUG (MainThread) [pyenphase.json] Unable to decode response from Envoy endpoint /production.json?details=1: unexpected character: line 1 column 5 (char 4)
2024-09-17 18:06:14.934 DEBUG (MainThread) [pyenphase.updaters.production] Production endpoint not found at /production.json?details=1: unexpected character: line 1 column 5 (char 4)
2024-09-17 18:06:14.934 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/production with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:15.082 DEBUG (MainThread) [pyenphase.envoy] Request reply in 0.1 sec from http://192.168.1.15/production status 200: application/json b'      <h1>System Energy Production</h1>\n      <div style="margin-right: auto; margin-left: auto;"><table>\n      <tr><td colspan="3">System has been live since\n                             <div class=good>Fri Sep 01, 2017 02:53 PM GMT+6</div></td></tr>\n      <tr><td>Currently</td>    <td>  49.0 W</td></tr><tr><td>Today</td>     <td> 3.27 kWh</td></tr><tr><td>Past Week</td>    <td> 26.9 kWh</td></tr><tr><td>Since Installation</td>    <td> 15.0 MWh</td></tr>\n      </table><br></div>\n'
2024-09-17 18:06:15.082 DEBUG (MainThread) [pyenphase.json] Unable to decode response from Envoy endpoint /production: unexpected character: line 1 column 7 (char 6)
2024-09-17 18:06:15.082 DEBUG (MainThread) [pyenphase.updaters.production] Production endpoint not found at /production: unexpected character: line 1 column 7 (char 6)
2024-09-17 18:06:15.082 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.envoy] Request reply in 0.0 sec from http://192.168.1.15/api/v1/production status 200: application/json b'{\n  "wattHoursToday": 3272,\n  "wattHoursSevenDays": 26896,\n  "wattHoursLifetime": 15043584,\n  "wattsNow": 49\n}\n'
2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.json] Unable to decode response from Envoy endpoint /production.json?details=1: unexpected character: line 1 column 5 (char 4)
2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.updaters.production] Production endpoint not found at /production.json?details=1: unexpected character: line 1 column 5 (char 4)
2024-09-17 18:06:15.117 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:16.801 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:19.201 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.839 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.879 DEBUG (MainThread) [pyenphase.updaters.api_v1_production_inverters] Production endpoint not found at /api/v1/production/inverters: Server disconnected without sending a response.
2024-09-17 18:06:20.879 DEBUG (MainThread) [pyenphase.updaters.ensemble] Firmware too old for Ensemble support
catsmanac commented 1 month ago

Hi @luis-castro, You are correct about the 2 API's it should use. Before I dive into the issue, I've got some question to assist with the troubleshooting.

The integration is constantly losing data from the inverters connected to the Envoy R, showing them as Unavailable.

Does it ever restore from unavailable without restarting Home-Assistant?

This is sometimes solved by restarting homeassist, others I have to delete the integration and add it again.

Looking at the information you provided (also see below) it seems as if the /api/v1/production/inverters is not reliably providing data. If this occurs during normal collection runs, the data will go to unavailable and come back when it works again. If this happens during a Home Assistant restart, the data will be considered as unavailable on the Envoy and not even be collected (some versions don't have this data).

If this happens you don't need to restart Home Assistant completely. If you open settings / devices & services / Enphase Envoy you can use the menu to reload the Envoy Integration. That will restart the communication and retry building all connections. This may work or not, but it avoids the whole restart of Home Assistant. And no, it's not a solution to the issue, just a simpler way to try to get it going again.

afbeelding

debug log

As for the debug log details, it shows is the startup sequence during which it determines what is available or not. This is not primary driving by the firmware version, but rather by the responses of the Envoy. This will include some failures depending on the Envoy model and firmware.

It received a successful reply for the /api/vi/production

2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.envoy] Request reply in 0.0 sec from http://192.168.1.15/api/v1/production status 200: application/json b'{\n  "wattHoursToday": 3272,\n  "wattHoursSevenDays": 26896,\n  "wattHoursLifetime": 15043584,\n  "wattsNow": 49\n}\n'

and as a result it will use this for production data.

The lines right below it is still part of the inspection and will lead to not trying to use the /production.json endpoint.

2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.json] Unable to decode response from Envoy endpoint /production.json?details=1: unexpected character: line 1 column 5 (char 4)
2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.updaters.production] Production endpoint not found at /production.json?details=1: unexpected character: line 1 column 5 (char 4)

The result is also reflected in the diagnostic file where both end-points are shown used. Raw_data is the data that was collected from the Envoy in the data last collection before the diagnostic report was created.

    "raw_data": {
      "/api/v1/production": {
        "wattHoursToday": 3260,
        "wattHoursSevenDays": 26887,
        "wattHoursLifetime": 15043572,
        "wattsNow": 46
      },
      "/api/v1/production/inverters": [
        {
          "serialNumber": "121726021970",
          "lastReportDate": 1726616739,
          "lastReportWatts": 11,
          "maxReportWatts": 218
        },
        {
          "serialNumber": "121726033042",

The next section trying to get the data from the inverters shows some issue though. No data is returned by the Envoy.

2024-09-17 18:06:15.117 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:16.801 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:19.201 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.839 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.879 DEBUG (MainThread) [pyenphase.updaters.api_v1_production_inverters] Production endpoint not found at /api/v1/production/inverters: Server disconnected without sending a response.

What can be seen is 4 connection attempts that fail right away and with varying wait time of 1-3 seconds in between. As a result the inverter data will probably not be collected after this restart and it would need another restart/reload on this Envoy in Home Assistant to try again.

The reported error Server disconnected without sending a response is also reported in the Error log of the original problem report. Although it's not clear if it was for the /api/v1/production/inverters endpoint as well. Based on your description it probably is.

luis-castro commented 1 month ago

Hi @catsmanac,

Thanks for taking a look at it.

Answering your question: Yes, it has been working for at least a year. It started failing around 2 months ago.

I believe the issue is in the pyenphase library itself. I've been playing around and have the following findings:

  1. If I use curl to hit the API /api/v1/production/inverters, it works with no issues:
    $ curl -u installer:[REDACTED] --digest http://192.168.1.15/api/v1/production/inverters
    [
    {
    "serialNumber": "121726021970",
    "lastReportDate": 1726672636,
    "lastReportWatts": 177,
    "maxReportWatts": 218
    },
    {
    "serialNumber": "121726033042",
    "lastReportDate": 1726672632,
    "lastReportWatts": 186,
    "maxReportWatts": 195
    },
    {
    "serialNumber": "121726021253",
    "lastReportDate": 1726672635,
    "lastReportWatts": 85,
    "maxReportWatts": 108
    },
    {
    "serialNumber": "121730034753",
    "lastReportDate": 1726672634,
    "lastReportWatts": 190,
    "maxReportWatts": 245
    }
    ]

I can keep this repeating with no issues.

  1. If use the library I noticed that pyenphase/envoy.py is trying to probe all Updaters, even the ones we know won't work in firmare version 3.

  2. However, even if I comment out the Updaters, it seems the library is not using Digest Authentication when hitting the API /api/v1/production/inverters.

Here is the modification I applied to envoy,py:

UPDATERS: list[type["EnvoyUpdater"]] = [
#    EnvoyMetersUpdater,
#    EnvoyProductionJsonUpdater,
#    EnvoyProductionUpdater,
    EnvoyApiV1ProductionUpdater,
#    EnvoyProductionJsonFallbackUpdater,
    EnvoyApiV1ProductionInvertersUpdater,
#    EnvoyEnembleUpdater,
#    EnvoyTariffUpdater,
#    EnvoyGeneratorUpdater,
]

However, when I called envoy.update(), I got this:

DEBUG:pyenphase.envoy:Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=RemoteProtocolError('Server disconnected without sending a response.')
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete

It seems to me the library is not hitting the API with Digest Authentication (but I am not 100% certain of that).

For reference this is the Python script I am using for this test:

import asyncio
from pyenphase import Envoy, EnvoyData
from pprint import pprint
import logging

logging.basicConfig(level=logging.DEBUG)

async def main():
        envoy = Envoy("192.168.1.15")
        print('Calling envoy.setup()...')
        await envoy.setup()
        print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')

        print('Calling envoy.authenticate()...')
        await envoy.authenticate(username="installer")

        while True:
                print('Calling envoy.update()...')
                data: EnvoyData = await envoy.update()
                print(f'TodaysEnergy: {data.system_production.watt_hours_today}')
                for inverter in data.inverters:
                        print(f'{inverter} SN: {data.inverters[inverter].serial_number}')
                        print(f'{inverter} W: {data.inverters[inverter].last_report_watts}')
                print('Waiting 10 seconds...')
                await asyncio.sleep(10)

if __name__ == "__main__":
        print('Calling main...')
        asyncio.run(main())

What are your thoughts about this?

luis-castro commented 1 month ago

It seems I was wrong about the library not using Digest Authentication, however something is not working.

Here is a complete dump of the output. I still don't understand what the problem is:

Calling envoy.authenticate()...
DEBUG:pyenphase.envoy:FW: 3.18.10, Authenticating to Envoy using envoy/installer authentication
Calling envoy.update()...
DEBUG:pyenphase.envoy:Requesting http://192.168.1.15/api/v1/production with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
DEBUG:httpcore.connection:connect_tcp.started host='192.168.1.15' port=80 local_address=None timeout=10.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7fb621921990>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Date', b'Wed, 18 Sep 2024 15:54:33 GMT'), (b'Content-Type', b'application/json'), (b'Content-Length', b'111')])
INFO:httpx:HTTP Request: GET http://192.168.1.15/api/v1/production "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:pyenphase.envoy:Request reply in 0.0 sec from http://192.168.1.15/api/v1/production status 200: application/json b'{\n  "wattHoursToday": 889,\n  "wattHoursSevenDays": 27300,\n  "wattHoursLifetime": 15044482,\n  "wattsNow": 598\n}\n'
DEBUG:pyenphase.envoy:Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=RemoteProtocolError('Server disconnected without sending a response.')
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:pyenphase.envoy:Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
DEBUG:httpcore.connection:connect_tcp.started host='192.168.1.15' port=80 local_address=None timeout=10.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7fb621974550>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Date', b'Wed, 18 Sep 2024 15:54:34 GMT'), (b'WWW-Authenticate', b'Digest qop="auth", realm="enphaseenergy.com", nonce="1726674874"'), (b'Content-Type', b'application/json'), (b'Content-Length', b'90')])
INFO:httpx:HTTP Request: GET http://192.168.1.15/api/v1/production/inverters "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=RemoteProtocolError('Server disconnected without sending a response.')
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:pyenphase.envoy:Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
DEBUG:httpcore.connection:connect_tcp.started host='192.168.1.15' port=80 local_address=None timeout=10.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7fb621974f10>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Date', b'Wed, 18 Sep 2024 15:54:36 GMT'), (b'WWW-Authenticate', b'Digest qop="auth", realm="enphaseenergy.com", nonce="1726674876"'), (b'Content-Type', b'application/json'), (b'Content-Length', b'90')])
INFO:httpx:HTTP Request: GET http://192.168.1.15/api/v1/production/inverters "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=RemoteProtocolError('Server disconnected without sending a response.')
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:pyenphase.envoy:Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
DEBUG:httpcore.connection:connect_tcp.started host='192.168.1.15' port=80 local_address=None timeout=10.0 socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7fb621977110>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Date', b'Wed, 18 Sep 2024 15:54:36 GMT'), (b'WWW-Authenticate', b'Digest qop="auth", realm="enphaseenergy.com", nonce="1726674876"'), (b'Content-Type', b'application/json'), (b'Content-Length', b'90')])
INFO:httpx:HTTP Request: GET http://192.168.1.15/api/v1/production/inverters "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.failed exception=RemoteProtocolError('Server disconnected without sending a response.')
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:pyenphase.updaters.api_v1_production_inverters:Production endpoint not found at /api/v1/production/inverters: Server disconnected without sending a response.
TodaysEnergy: 889
Waiting 10 seconds...

BTW: As this is not a home-assistant issue per se, let me know if I should file an issue with the pyenvoy team.

catsmanac commented 1 month ago

So your test program shows the issue as well. In the end it's probably a pyenhase library issue, before we move there I want to verify some things. I'm member of the pyenphase team as well, so we can switch when needed.

In your test code can you change start of main to:


from pyenphase import register_updater
from pyenphase.envoy import get_updaters
from pyenphase.updaters.base import EnvoyUpdater
from pyenphase.updaters.production import EnvoyProductionJsonUpdater, EnvoyProductionUpdater, EnvoyProductionJsonFallbackUpdater

logging.basicConfig(level=logging.DEBUG)

async def main():
    updaters: list[type[EnvoyUpdater]] = get_updaters()
    if EnvoyProductionJsonUpdater in updaters:
        _LOGGER.debug("Removing EnvoyProductionJsonUpdater from Pyenphase")
        updaters.remove(EnvoyProductionJsonUpdater)
    if EnvoyProductionUpdater in updaters:
        _LOGGER.debug("Removing EnvoyProductionUpdater from Pyenphase")
        updaters.remove(EnvoyProductionUpdater)
    if EnvoyProductionJsonFallbackUpdater in updaters:
        _LOGGER.debug("Removing EnvoyProductionJsonFallbackUpdater from Pyenphase")
        updaters.remove(EnvoyProductionJsonFallbackUpdater)
    envoy = Envoy("192.168.1.15")
    print('Calling envoy.setup()...')

This will remove the production endpoint updaters and prevent reading the production pages at startup probe to see if that removes the issue or not.

Let it start and run for 2 or 3 cycles and then upload the debug log here.

luis-castro commented 1 month ago

Done.

BTW: I got an error with _LOGGER.debug, so I changed it to print (sorry, I'm new to Python!).

Here is the output: pytest.log

catsmanac commented 1 month ago

Can try with username='envoy' as well?

luis-castro commented 1 month ago

Sure, here is the output, but there is no change: output-envoy.log

Anyway, I think "installer" should work as per my tests with curl.

catsmanac commented 1 month ago

Anyway, I think "installer" should work as per my tests with curl.

Agree

Ok, well not ok yet, on to next step. Let's eliminate the meters and tariff collection. Should only leave info, v1/production and inverters for collection.

Add

from pyenphase.updaters.meters import  EnvoyMetersUpdater
from pyenphase.updaters.tariff import EnvoyTariffUpdater

and

if EnvoyMetersUpdater in updaters:
    _LOGGER.debug("Removing EnvoyMetersUpdater from Pyenphase")
    updaters.remove(EnvoyMetersUpdater)
if EnvoyTariffUpdater in updaters:
    _LOGGER.debug("Removing EnvoyTariffUpdater from Pyenphase")
    updaters.remove(EnvoyTariffUpdater)

right after the already added sections

luis-castro commented 1 month ago

Applied the modifications and ran the new test.

See attached log: test-20240918-1925UTC.txt

For reference, this is how the test code looks now:

import asyncio
from pyenphase import Envoy, EnvoyData
from pyenphase import register_updater
from pyenphase.envoy import get_updaters
from pyenphase.updaters.base import EnvoyUpdater
from pyenphase.updaters.production import EnvoyProductionJsonUpdater, EnvoyProductionUpdater, EnvoyProductionJsonFallbackUpdater
from pyenphase.updaters.meters import EnvoyMetersUpdater
from pyenphase.updaters.tariff import EnvoyTariffUpdater
import logging

logging.basicConfig(level=logging.DEBUG)

async def main():
        updaters: list[type[EnvoyUpdater]] = get_updaters()
        if EnvoyProductionJsonUpdater in updaters:
                logging.debug("Removing EnvoyProductionJsonUpdater from Pyenphase")
                updaters.remove(EnvoyProductionJsonUpdater)
        if EnvoyProductionUpdater in updaters:
                logging.debug("Removing EnvoyProductionUpdater from Pyenphase")
                updaters.remove(EnvoyProductionUpdater)
        if EnvoyProductionJsonFallbackUpdater in updaters:
                logging.debug("Removing EnvoyProductionJsonFallbackUpdater from Pyenphase")
                updaters.remove(EnvoyProductionJsonFallbackUpdater)
        if EnvoyMetersUpdater in updaters:
                logging.debug("Removing EnvoyMetersUpdater from Pyenphase")
                updaters.remove(EnvoyMetersUpdater)
        if EnvoyTariffUpdater in updaters:
                logging.debug("Removing EnvoyTariffUpdater from Pyenphase")
                updaters.remove(EnvoyTariffUpdater)

        envoy = Envoy("192.168.1.15")
        logging.debug('Calling envoy.setup()...')
        await envoy.setup()
        print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')

        logging.debug('Calling envoy.authenticate()...')
        await envoy.authenticate(username="installer")

        while True:
                logging.debug('Calling envoy.update()...')
                data: EnvoyData = await envoy.update()

                print(f'TodaysEnergy: {data.system_production.watt_hours_today}')

                for inverter in data.inverters:
                        print(f'{inverter} SN: {data.inverters[inverter].serial_number}')
                        print(f'{inverter} W: {data.inverters[inverter].last_report_watts}')

                print('Waiting 10 seconds...')
                await asyncio.sleep(10)

if __name__ == "__main__":
        logging.debug('Calling main...')
        asyncio.run(main())
luis-castro commented 1 month ago

I thought the Envoy was blacklisting my IP, but I ran curl from the same machine I'm running python and it works:

$ curl -u installer:REDACTED --digest http://192.168.1.15/api/v1/production/inverters
[
  {
    "serialNumber": "121726021970",
    "lastReportDate": 1726687061,
    "lastReportWatts": 124,
    "maxReportWatts": 218
  },
...
catsmanac commented 1 month ago

So behavior is pretty consistent with access denied on the inverters endpoint.

Answering your question: Yes, it has been working for at least a year. It started failing around 2 months ago.

Any idea what version upgrade that was? I.e. if you made any backup of home assistant before upgrades you can see version of the backup in you backup list.

I'll do some search in what all we changed in pyenphase around that time, since like 3 months ago. That will be tomorrow as hours in the day are running out quickly here now.

luis-castro commented 1 month ago

Unfortunately the earliest backup I keep is from August 24. Back then I was running Home Assistant 2024.8.1 (same version I'm currently running), but I think I applied updates to some add-ons.

I appreciate all your efforts! Let me know if more information or tests are needed.

If you need access to this Envoy, I can arrange that too.

luis-castro commented 1 month ago

I have reviewed my statistics, the issue I had two months ago was unrelated (sorry about that!)

As you can see in this screenshot, the connectivity became unrealiable since September 6: Screenshot 2024-09-18 16 30 11

Today I upgraded Home Assistant to 2024.9.2 and upgraded Home Assistant OS to version 13.1. The inverters were showing results for a couple of minutes and then they became unavailable again.

catsmanac commented 1 month ago

So on September 6 when the issue started you were running 2024.8.1 that you installed before August, 24. That version became avaialble on August 10. Yesterday, Sep 18 you upgraded to 2024.9.2.

The last changes to the Envoy communication itself was released in 2024.8.0, which was an update in pyenphase to report some more data, but that data was already available. For the moment no direct leads to anything suspicious right away.

Looking at all information again I noted you used a password:

$ curl -u installer:REDACTED --digest

But no password with the Authentication in the test program. If that is an actual password that that is another test to do.:

await envoy.authenticate(username="installer",password="REDACTED")
catsmanac commented 1 month ago

And a simple test to just read the inverters using pyenphase

import asyncio
import httpx
from pyenphase import Envoy, EnvoyData
import logging

logging.basicConfig(level=logging.DEBUG)

async def main():
    envoy = Envoy("192.168.1.15")
    logging.debug('Calling envoy.setup()...')
    await envoy.setup()
    print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')

    logging.debug('Calling envoy.authenticate()...')
    await envoy.authenticate(username="installer")

    myresponse: httpx.Response = await envoy.request('api/v1/production/inverters')
    status_code = response.status_code
    print(status_code)
    print(myresponse)

if __name__ == "__main__":
        logging.debug('Calling main...')
        asyncio.run(main())
luis-castro commented 1 month ago

Hello!!

Looking at all information again I noted you used a password:

But no password with the Authentication in the test program. If that is an actual password that that is another test to do.

I got that password by calling the function emupwGetMobilePasswd(b'<SERIALNUMBER>', b'installer') from envoy_utils.

If I understand the code right, pyenphase uses the same method when the user is installer and no password is provided. However, I already tried specifying the password explicitly with no results.

Here is the output of the test (I had to change the request parameter, because it was lacking the initial / in the request): test2.txt

I even tried using the password:

await envoy.authenticate(username="installer", password="REDACTED")

But with the same results.

luis-castro commented 1 month ago

Something very strange happened: I added this to line 251 of pyenphase/envoy.py

            _LOGGER.debug("Authenticating with: %s:%s", self.auth.local_username, self.auth.local_password)

After that I am getting results from the inverters after a few retries.

Attaching the new output: test_2-20240919-1530UTC.log

The results are not 100% reproducible: sometimes it takes just 1 retry to get the results, others 4 retries are attempted and it finally crashes.

catsmanac commented 1 month ago

That statement also introduces some delay I guess. How about adding

import time

and a sleep before the request in the test program.

time.sleep(5)
myresponse: httpx.Response = await envoy.request('api/v1/production/inverters')
status_code = response.status_code
luis-castro commented 1 month ago

Same results: Sometimes it works, other it doesn't.

However, when using curl it always works with no delay at all.

I suspect the httpx object is related somehow, some additional configuration is required.

I'll try to run a packet capture and compare both curl and the test script to look for any differences.

catsmanac commented 1 month ago

I've checked and httpx is on version 0.27.0 for over 6 months. No recent change. Doesn't say it's not an httpx issue but not a lead either.

You probably did this already, but just checking anything that may help, did you power cycle the Envoy already?

luis-castro commented 1 month ago

Hi @catsmanac,

here is a promising finding:

When using httpx.get in interactive mode, it fails the first time:

$ python
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpx
>>> auth = httpx.DigestAuth("installer", "REDACTED")
>>> httpx.get('http://192.168.1.15/api/v1/production/inverters', auth=auth, timeout=10).content
Traceback (most recent call last):
...

However, if I repeat it, now it works, no matter how many times I send the request:

>>> httpx.get('http://192.168.1.15/api/v1/production/inverters', auth=auth, timeout=10).content
b'[\n  {\n    "serialNumber": "121726021970",\n    "lastReportDate": 1726773612,\n    "lastReportWatts": 161,\n    "maxReportWatts": 235\n  },\n  {\n    "serialNumber": "121726033042",\n    "lastReportDate": 1726773608,\n    "lastReportWatts": 159,\n    "maxReportWatts": 235\n  },\n  {\n    "serialNumber": "121726021253",\n    "lastReportDate": 1726773610,\n    "lastReportWatts": 154,\n    "maxReportWatts": 208\n  },\n  {\n    "serialNumber": "121730034753",\n    "lastReportDate": 1726773608,\n    "lastReportWatts": 174,\n    "maxReportWatts": 250\n  }\n]\n'

I haven't tried to reboot the Envoy, but will try that later.

Does this help to implement a change in pyenphase?

catsmanac commented 1 month ago

pyenphase is already trying multiple times. The request function in envoy.py has the tenacity decorator to retry the request

@retry(
  retry=retry_if_exception_type(
    (
        httpx.NetworkError,
        httpx.TimeoutException,
        httpx.RemoteProtocolError,
        orjson.JSONDecodeError,
    )
    ),
    wait=wait_random_exponential(multiplier=2, max=5),
    stop=stop_after_delay(MAX_REQUEST_DELAY)
    | stop_after_attempt(MAX_REQUEST_ATTEMPTS),
    reraise=True,
    )
    async def request(

which will perform the request again in case of any of the 4 errors listed in the decorator until the MAX_REQUEST_DELAY (50sec) time elapsed and no more then MAX_REQUEST_ATTEMPTS (4) attempts. Between each attempt there's a random wait time. In your first debug log these repeats actually show

2024-09-17 18:06:15.117 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:16.801 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:19.201 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.839 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)

So it's not that straight forward. Not sure why this is not working and throwing these errors for the inverters. Nor is there a real understanding what changed that caused this to fail after Sep 7.

What was the error you got on the first attempt in your last try? Again the RemoteProtocolError?

luis-castro commented 1 month ago

What was the error you got on the first attempt in your last try? Again the RemoteProtocolError?

Yes, the same error: httpcore.RemoteProtocolError: Server disconnected without sending a response.

I ran a packet capture running curl, httpx and the test with pyenphase.

With httpx:

The exact same behavior is observed with curl.

When using the script with pyenphase, this is the sequence:

(see attached: capture-pyenphase.pcapng`)

I conclude Envoy expects the call GET /api/v1/production/inverters to be issued with the Authorization headers at the first attempt. You can't reuse the same TCP connection to send the request a second time.

capture-files.zip

luis-castro commented 1 month ago

The problem is HTTPX is reusing the same connection for following authentication requests after getting a 401. The Envoy does not like that.

So I modified the initialization code of the Envoy class in pyenphase/envoy.py, adding a line to disable the keepalive connections:

        self._client = client or httpx.AsyncClient(
            verify=NO_VERIFY_SSL_CONTEXT,
            limits=httpx.Limits(max_keepalive_connections=0)
        )  # nosec

Now the testing code in https://github.com/home-assistant/core/issues/126162#issuecomment-2360947170 works flawlessly.

However, when I applied this to the homeassistant container, I'm still getting the same problem.

I'm sure you'll have a better idea on how to implement this change (disabling keepalives) only when reading the inverters endpoint. What do you think?

catsmanac commented 1 month ago

Thanks for all that research @luis-castro, excellent data-set. Your analysis on what's happening is to the point.

Dropping keep-alive altogether may not be without potential impact. Although it seems a solution, it may be only a band-aid for an underlying (httpx) issue, not sure yet. Needs some taught on how best to proceed.

What is still puzzling is the sudden change after September 6.

The RemoteProtocolError probably indicates a closed connection on the Envoy side triggering HTTPX to open a new one, but its not using the already received auth information ending in the same cycle of events.

I'd still be interested to see if an Envoy reboot changes anything. How this is going, probably not...

To use a modified lib in the container you need to use uv pip install --editable /path/to/your/pyenphase using the container console and have your local version accessible. Make sure to not change the lib version as HA will pull to version it needs

current version:

uv pip freeze | grep pyenphase
pyenphase==1.22.0

local version

pip freeze | grep pyenphase
-e local version

For testing I use a Home-Assistant dev container.

As for the client, when using the pyenphase library one can provide its own client, thus overriding the default client used. That would allow your (test) program to avoid keep-alive without the need to change to library.

my_client = httpx.AsyncClient(
            verify=NO_VERIFY_SSL_CONTEXT,
            limits=httpx.Limits(max_keepalive_connections=0)
        )

envoy = Envoy("192.168.1.15",client=my_client)
print('Calling envoy.setup()...')
await envoy.setup()
print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')

print('Calling envoy.authenticate()...')
await envoy.authenticate(username="installer")

print('Calling envoy.update()...')
data: EnvoyData = await envoy.update()
print(f'TodaysEnergy: {data.system_production.watt_hours_today}')
for inverter in data.inverters:
    print(f'{inverter} SN: {data.inverters[inverter].serial_number}')
    print(f'{inverter} W: {data.inverters[inverter].last_report_watts}')
luis-castro commented 1 month ago

Hi @catsmanac!

Thank you very much for all your input. I'm glad you found my analysis useful. To be honest, these last 3 days I've learned a lot.

Regarding rebooting the Envoy. I did that last night, but there was no change.

Also, I applied the modifications in the container by directly editing the file usr/local/lib/python3.12/site-packages/pyenphase/envoy.py. Yes, I know this is somewhat "brute", but now you show me how to do this in a more proper way.

Since I modified the file, the inverters information has been more stable (with some dropoffs still), but I'm not sure this is the best approach.

If you agree, let's keep this Issue open for a while, in case other users report having the same problem.

Additionally, at the Enlighten dashboards, I see a "Ensemble Update" event on September 7. Not sure if it's related or not, but I suspect something changed in the firmware recently.

Anyway, I'll try to figure out a more elegant solution and will share any insights here.

Thanks again.

catsmanac commented 1 month ago

Not closing this one yet indeed. Too much still unclear.

The Envoy getting a new firmware could be an explanation. But if so it's troubling if the behavior changed that much. Wouldn't be the first time though.

The fact that there's still some drop-outs isn't satisfying either. No hurry, but I'd be interested in a fresh debug log with some failures in it.

luis-castro commented 1 month ago

Hi @catsmanac!

So I modified the enphase_envoy initialization code to use a custom client, and now the integration has been running for over an hour with no single glitch.

Not sure what the implications are of using my own client instead of calling hass's get_async_client(), but so far everything looks good.

Here are the changes I applied, in case you'd like to give it a try and/or enhance it:

--- core/homeassistant/components/enphase_envoy/__init__.py.bak 2024-09-20 16:20:03.976546179 -0600
+++ core/homeassistant/components/enphase_envoy/__init__.py     2024-09-20 16:21:12.176602804 -0600
@@ -13,12 +13,13 @@
 from .const import DOMAIN, PLATFORMS
 from .coordinator import EnphaseConfigEntry, EnphaseUpdateCoordinator

+import htppx

 async def async_setup_entry(hass: HomeAssistant, entry: EnphaseConfigEntry) -> bool:
     """Set up Enphase Envoy from a config entry."""

     host = entry.data[CONF_HOST]
-    envoy = Envoy(host, get_async_client(hass, verify_ssl=False))
+    envoy = Envoy(host, client=httpx.AsyncClient(verify=False, limits=httpx.Limits(max_keepalive_connections=0)))
     coordinator = EnphaseUpdateCoordinator(hass, envoy, entry)

     await coordinator.async_config_entry_first_refresh()
catsmanac commented 1 month ago

Is that last change using the original pyenphase code again or still a modified pyenphase?

Your last change is part of one of the options on my list how to implement this once we have confirmed the results. This change will also change behavior of your second envoy running 7.3.617 and when implemented as-is on all communication. Therefore a fix will probably need to be an option that can be enabled as needed or automatically. I'll get you an example how that can be done later.

Additionally, at the Enlighten dashboards, I see a "Ensemble Update" event on September 7. Not sure if it's related or not, but I suspect something changed in the firmware recently.

To see if there was an Envoy firmware update you can open a backup file. Buried in there will be the .storage folder with the core.device_registry file. In that file there's an entry for the Envoy which shows the firmware in sw_version.

catsmanac commented 1 month ago

Overview of current findings:

Envoy firmware 3.18.10 observations show that obtaining data from the endpoint /api/v1/production/inverters is prone to failures with connection pool enabled. Until now this is the only firmware reporting this.

Observations:

Side-obeservations:

Test shows that setting the client to max_keepalive_connections=0 solves the issue. Network captures need to confirm, but it is assumed that in this case HTTPX will actually use the Digest on the new connection (). Test with CURL show that works fine with the Envoy.

As for solutions available:

  1. Add a configuration option the the Enphase Envoy integration to disable the keep-alive connection as needed by the user.
  2. Change the pyenphase library to detect this specific envoy firmware version and disable keep-alive.
  3. Register the issue with HTPX and get the HTTPX library changed, if accepted as an issue.

For feasibility and complexity:

  1. Least impact and complexity even though we want to minimize needed options/configurations. Only code change in the integration by adding an additional option,
  2. Will also require to change the integration as it provides the client and pyenphase will need to set a new client after firmware detection or signal something back to the integration to do so,
  3. This seems to correct way, provided it is recognized as an issue and desired change. Will take more time and testing.

For now I'd propose to discuss the issue with HTTPX, prepare a HA PR to implement 1. as bug fix for now, until HTTPX solutions comes available, if ever. Change can be reverted then.

catsmanac commented 1 month ago

@luis-castro, I've started a discussion with httpx team as well.

Can you provide another packet capture showing some packets where all is working fine in your new setup?

catsmanac commented 1 month ago

Some code to try, with configure option. Haven't been able to fully test yet.

afbeelding

Reload the integration after changing the bottom option.

enphase_envoy.zip

luis-castro commented 1 month ago

Is that last change using the original pyenphase code again or still a modified pyenphase?

It is using the original pyenphase code.

Can you provide another packet capture showing some packets where all is working fine in your new setup?

Sure! Here is a packet capture taken from my dev environment with the modified homeassistant/components/enphase_envoy/__init__.py.

capture-patched.zip

Some code to try, with configure option. Haven't been able to fully test yet.

Thank you!!! Will try out this later today and let you know the results.

catsmanac commented 1 month ago

Any luck testing this?

luis-castro commented 1 month ago

Hello! Sorry, I didn't had a chance of testing it till today.

I'm testing it right now. I enabled the "disable_keep_alive" option and so far it is working with no issues.

I wasnt' sure how to upload the files, so this how I installed the files:

  1. From my workstation, I transferred the files to the SSH addon container:

    $ scp * root@homeassistant.local:./enphase_patch
  2. Connected to host and copied files from SSH addon container to host filesystem:

    $ ssh root@homeassistant.local -p 22222  # Connect to host OS
    cd /tmp
    for A in __init__.py config_flow.py const.py strings.json
    do
     cp addon_core_ssh:/root/enphase_patch/$A .
    done
  3. Backup original files and uploaded new versions to homeassistant container:

    for A in __init__.py config_flow.py const.py strings.json
    do
     docker cp homeassistant:/usr/src/homeassistant/homeassistant/components/enphase_envoy/$A $A.bak
     docker cp $A homeassistant:/usr/src/homeassistant/homeassistant/components/enphase_envoy/
    done
  4. Restarted Home Assistant.

Let me know if there is a more appropriate procedure to do it.

catsmanac commented 1 month ago

Thanks @luis-castro, delay is no problem at all, we all have day things to do.

The process you used is fine, you got the "disable_keep_alive" option and as that is working fine it is activated.

Keep in mind that when you upgrade to the just released 2024.10 HA version it will overwrite your changes!

I'll prepare a code update for a future release. Thanks for all your testing !!!.