Powerwall Firmware 23.44.3 Update - Lost String Data

prussell69 commented 4 months ago

Powerwall firmware updated to V23.44.3-msa 193a3259 at 12:00AM, and now all the string data is gone

Host System

Windows 11
Docker
Powerwall+ x3

Additional context localhost:8675/strings returns: {}

localhost:8675/vitals returns: TIMEOUT!

localhost:8675/temps returns: {}

localhost:8675/stats returns: {"pypowerwall": "0.7.7 Proxy t40", "gets": 2135, "errors": 0, "timeout": 6, "uri": {"/alerts/pw": 292, "/aggregates": 292, "/strings": 293, "/soe": 293, "/temps/pw": 292, "/pod": 292, "/freq": 292, "/version": 3, "/api/status": 5, "/api/site_info/site_name": 2, "/api/auth/toggle/supported": 3, "/api/sitemaster": 11, "/api/meters/aggregates": 7, "/api/site_info": 2, "/api/system_status/soe": 7, "/api/system_status/grid_status": 7, "/api/system_status/grid_faults": 2, "/api/powerwalls": 7, "/api/customer/registration": 2, "/api/networks": 2, "/temps": 1}, "ts": 1708877109, "start": 1708875645, "clear": 1708875645, "uptime": "0:24:24", "mem": 41408, "site_name": "Our House", "cloudmode": false, "siteid": null, "counter": 0, "authmode": "cookie"}

I'm still getting the other data: Solar, Grid, House, etc.

jasonacox commented 4 months ago

Hi @prussell69 - this is a known issue with Tesla latest Firmware upgrade starting with Firmware 23.44.0 - see https://github.com/jasonacox/Powerwall-Dashboard/discussions/402

Summary: With this upgrade, Tesla has removed the /api/devices/vitals (https://powerwall/api/devices/vitals) API which was a binary protobuf that pypowerwall decoded for things like strings, temps, alerts, and island voltages. Currently the only workaround is to build a multi-homed device that connects to your LAN and the Powerwall gateway WiFi to pull some of that data (strings, alerts, voltages) from the tedapi API used by the Tesla Pros/One app (see https://github.com/jasonacox/Powerwall-Dashboard/discussions/392).

Thankfully, as you mention, we are still getting the other core data via the local Powerwall API. There was a real possibility that all the local APIs (and related web portal) would be removed as we have seen with the PW3. Hopefully they stay. If those are removed, we will need to move to cloud mode which has even less data, but we would at least get the core data (although at a lower fidelity).

My system has not upgraded so I haven't been able to investigate any other options. If others who are upgraded want to dig in deeper to see if there are other ways to get this data, please let us know what you discover.

jasonacox commented 4 months ago

The one interesting thing about your finding:

localhost:8675/vitals returns: TIMEOUT!

Technically this should be a 404 so it would not timeout. It sounds like the Powerwall is holding that connection instead of responding with a 404. I'm not sure why. Are others seeing this?

Can you log in to your Powerwall gateway portal and after doing so, try to go to https://powerwall/api/devices/vitals (replace "powerwall" with the IP address of your Powerwall) to see what it does. In pre-upgrade Firmwares, it would download a binary file (the protobuf). It would be interesting to see if you get that or a timeout.

BuongiornoTexas commented 4 months ago

Interesting - I tried this and I got some odd results.

First time I tried dashboard:8675/vitals returned TIMEOUT and took about 10-15 seconds to get there.
However, https://powerwall/api/devices/vitals returns 404 page not found almost immediately.
I then tried dashboard:8675/vitals which then went into an indefinite/very slow page load.

I've also tried this with a couple of other pages that are on the api (e.g api/system_status, api_site_info), Calls to dashboard:8675 load extremely slowly (not timed, but I'm guessing 30s to a minute plus), while direct calls to the powerwall return almost instantly.

I guess it's time to spin up a local pypowerwall and get the debugger out ...

jasonacox commented 4 months ago

took about 10-15 seconds to get there

Yikes! Are you using the latest pypowerwall version? You could turn on debug mode on in pypowerwall.env:

PW_DEBUG=yes

And then docker restart pypowerwall; docker logs pypowerwall -f to see what is happening. It should see the 404 and back off.

BuongiornoTexas commented 4 months ago

Tried the debug=yes - unfortunately, the logs aren't providing anything useful. Additional minor data:

Verify.sh can take a very LONG time to do the listening test for 8675 (and sometimes fails).
Calls to 8675 seem to work well on the first call from a fresh browser or a page in a browser that hasn't been used in while, but then go into the long waits.

(This feels internal to pypowerwall, but might also be a Tesla blocking policy? It's a long shot, but maybe an agent string to masquerade as a browser could be a thing - not sure if pypowerwall does this already?)

I'll do the thing with the python debugger and report back.

jasonacox commented 4 months ago

Similar issue reported in https://github.com/jasonacox/Powerwall-Dashboard/issues/425

BuongiornoTexas commented 4 months ago

I realised enabling debug required an up -d, and now have a bit more info.

The proxy seems to be gathering data cheerfully and frequently from the powerwall. Lots and lots of lines in the mode: [DEBUG] 172.18.0.1 "GET /api/meters/aggregates HTTP/1.1" 200 - (If I understand correctly, the 172 block is a private subnet generated docker).
Two api calls are having issues: api/troubleshooting/problems and api/troubleshooting/problems. But even then, pypowerwall is logging these as ERROR Powerwall API not found at ... and continuing without problem.
The main error logged is: [proxy] [ERROR] Socket broken sending response [doGET], and there are a lot of these. There seem to be a cluster of these each time I try to access a page via 8675 or use verify.sh. (I could see these without debug enabled - I wasn't sure how significant they were, as these have been mentioned as a periodic problem in previous issues.)
I've seen a couple of TIMEOUTs in the powerwall api calls, but these have been very infrequent.

I'm not sure I'm interpreting this information correctly, but given the logging shows lots of successful calls to the powerwall and I'm having no problems acessing the powerwall directly, then the issue seems to be how pypowerwall proxy is serving the data to clients?

BuongiornoTexas commented 4 months ago

One more: these two appear periodically:

02/26/2024 11:09:55 AM [proxy] [ERROR] Missing key in payload [nominal_full_pack_energy] 02/26/2024 11:09:55 AM [proxy] [ERROR] Missing key in payload [nominal_energy_remaining]

jasonacox commented 4 months ago

02/26/2024 11:09:55 AM [proxy] [ERROR] Missing key in payload [nominal_full_pack_energy] 02/26/2024 11:09:55 AM [proxy] [ERROR] Missing key in payload [nominal_energy_remaining]

I was editing the proxy server to fix a bug showing up in a Solar Only installation and changed the log.error to log.debug for those two errors to help remove some of the noise. You can update to the latest test: jasonacox/pypowerwall:0.7.7t41

If you are using the powerwall.yml file (or simliar) with docker-compose, you can edit that file and change the image for pypowerwall to jasonacox/pypowerwall:0.7.7t41 and then run ./compose-dash.sh up -d or equivalent.

Two api calls are having issues: api/troubleshooting/problems and api/troubleshooting/problems. . But even then, pypowerwall is logging these as ERROR Powerwall API not found at ... and continuing without problem.

What Powerwall Firmware are you running? Also, can you post examples of those errors?

The main error logged is: [proxy] [ERROR] Socket broken sending response [doGET]

Those will show up occasionally but especially if you are being rate limited by the Powerwall. I would try to stop pypowerwall for 5m or so and see if that helps clear some of that. Also, make sure you don't have multiple scripts running against the Powerwall.

BuongiornoTexas commented 4 months ago

You can update to the latest test: jasonacox/pypowerwall:0.7.7t41

I'll be taking a break on this for a bit, but will do when I get back to it.

What Powerwall Firmware are you running? Also, can you post examples of those errors?

Firmware: 23.44.0

02/26/2024 11:07:36 AM [pypowerwall] [DEBUG] ERROR Powerwall API not found at https://192.168.25.11/api/troubleshooting/problems`
02/26/2024 11:07:35 AM [pypowerwall] [DEBUG] ERROR Powerwall API not found at https://192.168.25.11/api/devices/vitals

Those will show up occasionally

Not so much. If I have debug mode set to no, these are pretty well all I'm seeing in the pypowerwall log. I've done the stop for 5m/20m/30m and they come straight back. I get a cluster every 5 minutes or so.

(one thing I have meant to note - I'm limited to wifi access to the powerwall - it might be worth checking if others running into 0.44 issues are likewise constrained.)

I'm not sure what you mean by multiple scripts against pypowerwall. I believe my Powerwall-Dashboard is a vanilla setup with standard queries from pypowerwall via telegraf to influxdb. I do run pwdusage, but that queries influxdb for existing data and doesn't poll pypowerwall at all.

jasonacox commented 4 months ago

Thanks @BuongiornoTexas !

Socket broken sending response [doGET]

These are client connection errors. The client (browser or telegraf) somehow have their connection interrupted. I know the http pool will cause it to cycle occasionally but if you are seeing a constant stream, I would check your WiFi to see if you don't have some congestion going on. I've seen that. I had a case where my dashboard (showing the power flow animation) was in a bad WiFi spot and spun up a lot of those errors. I bounced the WiFi and it was better. The other option is that one of the API calls is HANGING too long (connected to the Powerwall) and causing the clients to all timeout. If that is the case, we need to figure out which API is hanging.

Can you paste the http://localhost:8675/stats so we can see if there are high timeout errors?

BuongiornoTexas commented 4 months ago

I think my topology may not match up with what you are describing. My setup is like this:

powerwall <-> wifi <-> router* <-> wired <-> NUC (Powerwall-Dashboard on debian among other things)
                                         <-> Window grafana clients

So I'm seeing the problems with 1GB/s wired connections between the pwd host and the clients**, and even though the wifi to the powerwall is not great, I can see that the response from the powerwall to the proxy and direct calls to the powerwall api are more or less instantaneous. I also know the wifi around the house is strong (aside from the powerwall) - we can happily run 4k streams on it.

If I understand your explanation, the most likely problem is then a hang on an API call?

* Router is an ASUS RT-AX86U, CPUs are untroubled by network load (<1%), and 50% spare RAM, so I'm pretty sure this is not an issue. ** I suspect the NUC may limit things to 100MB/s, but even so ...

jasonacox commented 4 months ago

If I understand your explanation, the most likely problem is then a hang on an API call?

Yes, 100%.

jasonacox commented 4 months ago

Do you see timeouts rising in http://localhost:8675/stats ?

BuongiornoTexas commented 4 months ago

Sorry - I missed the stats - here you go.


{
    "pypowerwall": "0.7.7 Proxy t40",
    "gets": 20386,
    "errors": 0,
    "timeout": 1767,
    "uri": {
        "/api/meters/aggregates": 3450,
        "/api/system_status/soe": 3452,
        "/api/sitemaster": 3454,
        "/api/powerwalls": 3450,
        "/api/system_status/grid_status": 3450,
        "/api/auth/toggle/supported": 956,
        "/aggregates": 201,
        "/soe": 199,
        "/alerts/pw": 199,
        "/temps/pw": 191,
        "/api/meters/site": 195,
        "/api/system_status": 196,
        "/strings": 191,
        "/pod": 198,
        "/freq": 195,
        "/api/meters/solar": 197,
        "/version": 165,
        "/api/status": 8,
        "/api/site_info/site_name": 3,
        "/api/system_status/grid_faults": 4,
        "/api/customer/registration": 3,
        "/api/site_info": 4,
        "/api/networks": 4
    },
    "ts": 1708912160,
    "start": 1708907231,
    "clear": 1708907231,
    "uptime": "1:22:09",
    "mem": 47408,
    "site_name": "XXX",
    "cloudmode": false,
    "siteid": null,
    "counter": 0,
    "authmode": "cookie"
}
``

jasonacox commented 4 months ago

"timeout": 1767,

That's showing that about 10% of the calls are timing out! Yikes.

I wonder if it is all the calls or just certain calls. Can you manually curl / browse to each of these and see if any are particularly slow or timeout?

http://localhost:8675/{api}

    "/api/meters/aggregates": 3450,
    "/api/system_status/soe": 3452,
    "/api/sitemaster": 3454,
    "/api/powerwalls": 3450,
    "/api/system_status/grid_status": 3450,
    "/api/auth/toggle/supported": 956,
    "/aggregates": 201,
    "/soe": 199,
    "/alerts/pw": 199,
    "/temps/pw": 191,
    "/api/meters/site": 195,
    "/api/system_status": 196,
    "/strings": 191,
    "/pod": 198,
    "/freq": 195,
    "/api/meters/solar": 197,
    "/version": 165,
    "/api/status": 8,
    "/api/site_info/site_name": 3,
    "/api/system_status/grid_faults": 4,
    "/api/customer/registration": 3,
    "/api/site_info": 4,
    "/api/networks": 4

BuongiornoTexas commented 4 months ago

Responses are erratic to say the least. Occasionally I get a very fast response, a lot of the time there is a significant delay and then a response, another smaller chunk of the time I get: curl: (52) Empty reply from server. It doesn't seem to be tied to any specific URI - this behaviour happens with /api/meters/aggregates, /api/system_status/soe, and /version. All tested on the pwd host machine too, so no network delays to the client either.

And then there has been this one solitary logged exception.

----------------------------------------
Exception occurred during processing of request from ('172.18.0.1', 53384)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.10/http/server.py", line 401, in handle_one_request
    self.raw_requestline = self.rfile.readline(65537)
  File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
----------------------------------------

I've got some other things to get on to, but will hopefully be back at this in a day or two.

jasonacox commented 4 months ago

No worries @BuongiornoTexas. I wonder if your case is unique, unless @prussell69 is seeing the same thing.

Some suggestions:

Stop telegraf (stop all traffic) give it ~5 minutes or so and then roll through each of the APIs again.
Try spinning up a a Dashboard stack on another system to try to eliminate host issues.

BuongiornoTexas commented 4 months ago

I wonder if your case is unique

Well it's possible, but from memory, most of my customisation starts at/downstream of influxdb. I've got a a couple of things to try now:

retry your slow start suggestion (not hopeful though based on previous tests).
Check for any customisation that might impact pypowerwall (maybe some custom urls in telegraf when I was looking at 3 phase?).
Run an interactive debug on pypowerwall on a separate system (which I hope will give more information than working with the docker stack).

I'll report back when I have something useful to say. :-)

BuongiornoTexas commented 4 months ago

@jasonacox - The update to 4.0.3 has completely resolved the problems without making any other changes in my setup. No more doGET errors, no more missing data (I discovered there were significant holes in some three phase data sets I was pulling into the raw data RP/cq). My stack is now running smoothly and incredibly quickly. I had also been seeing significant delays in loading the power flow animation (I had assumed they were a consequence of the transition to the new system), and those have disappeared entirely.

If I had to guess, I would place the fix on adding a default value for /api/troubleshooting/problems, as none of the other changes are showing up. I did a quick check on the debug log for pypowerwall and it is now littered with warnings about 404s on the troubleshooting and vitals calls. Absolutely no issues with rate limiting as far as I can tell.

Based on this experience, I would strongly recommend anyone on the 23.44 firmware should immediately update to 4.0.3.

jasonacox commented 4 months ago

That's fantastic news, @BuongiornoTexas !! I agree on the recommendation.

BuongiornoTexas commented 4 months ago

Just checked my stats and pypowerwall log, which has now been up for 6 hours. 0 timeouts, 0 errors in the stats page, the log is essentially empty (specifically, 4x doGet errors corresponding to manual calls to pypowerwallhost:8675/vitals - and even then, no delays - instant response).

prussell69 commented 4 months ago

I also upgraded to 4.0.3. Everything seems to be there except: ALL P/W+ data (String Voltage, String Current, String Power, Inverter Power, P/W Temps). P/W Freqs are OK for my 3 P/Ws, and it's returning the P/W capacities. Also, all the Alert data is gone. On a side note, I also upgraded my Docker to 4.28.0 thinkinking that was causing me some issues, and the upgrade appeared OK at first, but at midnight, my P/W data stopped completely. When I tried to restart my Powerwall-Dashboard in docker, the Grafana looked like it was restarting itself every 5 or 6 seconds. I could read data from pypowerwall, so it looked like a Grafana issue. I tried upgrading Dashboard again, and even tried deleting the Grafana instance and re-upgrading to rebuild it. Didn't work. I finally shut down Docker, and re-started it running it as "administrator" (I'm running Windows 11). Everything restarted OK, and I was able to run the "Tesla History" tool to rebuild the missing data (Thanks again for that great utility). My system seems to be up and running, minus the string & inverter data. God only knows why Tesla decided to kill that data stream. Also, My P/W+ has a hardwired network connection, not that that makes any difference.

prussell69 commented 4 months ago

Just a quick update... My main Docker PC (I'm running on 2 different PC's) ran about 4 hours, and the P/W info stopped again. On the Grafana dashboard, the graphs were blank from the time it stopped updating, and the graphic image from the tesla powerwall itself was now blank, and showing "Cloud only" where the firmware is normally displayed. I shut down Docker, cleared out Windows temp folder, then restarted Docker again as "Administartor". Everything started back up, including my Teslamate. I had to run the Tesla-History tool again to fill in the gaps. The other PC was still running OK, so I'm not sure whats causing it. I'll keep my eye on things and keep everyone up to date.

jasonacox commented 4 months ago

Hi @prussell69 - If it happens again, can you gather the docker logs to see if there is any error indicated? Also, run verify.sh if you can.

prussell69 commented 4 months ago

OK. Where do I get the Docker logs from? When I click on the "Powerwall-Dashboard" on the Docker container screen, it gives me a log miles long and scrolling in real time. I wouln't think you want a 6mb log file. Detailed instructions would be great. I will also run the verify.sh script when it happens again. It did happen on both of my systems around midnight last night. This time I just restarted the PC's and on my main PC, when Docker started, it prompted twice for "Elevated Privaliges". the backup system only prompted once. May have something to do with the last upgrade of Docker (4.28.0) I'm not sure how to downgrade Docker, otherwise I try that if theres a way to make sure I don't loose my data. Thanks for helping!

prussell69 commented 4 months ago

It stopped on my backup system. The pypowerwall check took 45 seconds or so.

Checking pypowerwall

Config File pypowerwall.env: GOOD
Container (pypowerwall): GOOD
Service (port 8675): GOOD
Version: 0.7.8 Proxy t41
Powerwall State: CONNECTED - Firmware Version: SolarOnly
Cloud Mode: NO

Checking telegraf

Config File telegraf.conf: GOOD
Local Config File telegraf.local: GOOD
Container (telegraf): GOOD
Version: Telegraf 1.28.2 (git: HEAD@8d9cf395)

Checking influxdb

Config File influxdb.conf: GOOD
Environment File influxdb.env: GOOD
Container (influxdb): GOOD
Service (port 8086): GOOD
Filesystem (./influxdb): GOOD
Version: InfluxDB shell version: 1.8.10

Checking grafana

Config File grafana.env: GOOD
Container (grafana): GOOD
Service (port 9000): GOOD
Filesystem (./grafana): GOOD
Version: Grafana CLI version 9.1.2

Checking weather411

Container (weather411): GOOD
Service (port 8676): GOOD
Weather: {"temperature": 75.31}
Version: 0.2.3

All tests succeeded.

prussell69 commented 4 months ago

I did see this around the time it stopped updating. I pulled it from the scrolling log file: 2024-03-02 10:25:30 telegraf | 2024-03-02T15:25:30Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/alerts/pw]: Get "http://pypowerwall:8675/alerts/pw": dial tcp 172.18.0.3:8675: connect: connection refused 2024-03-02 10:25:30 telegraf | 2024-03-02T15:25:30Z I! [agent] Hang on, flushing any cached metrics before shutdown 2024-03-02 10:25:30 telegraf | 2024-03-02T15:25:30Z I! [agent] Stopping running outputs 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loading config: /etc/telegraf/telegraf.conf 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loading config: /etc/telegraf/telegraf.d/local.conf 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Starting Telegraf 1.28.2 brought to you by InfluxData the makers of InfluxDB 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loaded inputs: cpu disk diskio http (2x) kernel mem processes swap system 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loaded aggregators: 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loaded processors: date (2x) 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loaded secretstores: 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Loaded outputs: influxdb 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! Tags enabled: host=telegraf 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"telegraf", Flush Interval:10s 2024-03-02 10:26:27 telegraf | 2024-03-02T15:26:27Z W! [outputs.influxdb] When writing to [http://influxdb:8086]: database "powerwall" creation failed: Post "http://influxdb:8086/query": dial tcp 172.18.0.6:8086: connect: connection refused 2024-03-02 10:26:30 telegraf | 2024-03-02T15:26:30Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/temps/pw]: Get "http://pypowerwall:8675/temps/pw": dial tcp 172.18.0.2:8675: connect: connection refused

I tried stopping & starting Powerwall-Dashboard inside Docker, but then Grafana kept restarting. I Restarted Docker, and everything started OK. Not sure where to look next.

jasonacox commented 4 months ago

This does sounds like a Windows + Docker issue. My Windows and MacOS Docker Desktop installations are not having any issues. Thanks for posting the telegraf logs. It would be good to get the pypowerwall and grafana logs. You can do that by expanding powerwall-dashboard and selecting the "View Details" option for the pypowerwall container:

Or from the command line:

docker logs pypowerwall
docker logs grafana

I ran a google search for Docker on Windows issues and it seems quite a few people experience this. One remediation is that Docker will get corrupt during an upgrade and require clean re-install. One suggestion (I'm not a Windows user so this could be useless or even dangerous? :)

Uninstall docker desktop
Delete all Docker settings from %username%\AppData
Reboot
Install Docker Desktop

If you get a stable install, I would advise turning off upgrades. 😁

prussell69 commented 4 months ago

I tried to downgrade Docker on my laptop where I do testing from time to time. It had been updated also to 4.0.3. I followed your instructions above and re-installed 4.24.2 (I had a copy of that version), and did not update it. I had to re-install my Powerwall-Dashboard, and Teslamate, which killed any existing data. I was able to restore from a backup. Is there anyway to keep the old data besides doing a backup/restore? My main system has other containers running and I'd rather not have to recreate everything if possible. I will keep my laptop running for a while and see how it does. Hopefully it won't crash anymore. I did manage to get the log files you asked for. Hopefully you will see something that will help.

pypowerwall-log.txt Grafana-log.txt

jasonacox commented 4 months ago

Thanks @prussell69 !

The pypowerwall logs are showing network connectivity issues. I suspect the Docker internal networking is failing which also breaks your ability to see the Grafana dashboard. I don't know why this would happen. I suppose the Powerwall Gateway itself could go offline for a bit (network issues or a failed upgrade process) but that shouldn't prevent you from getting to Grafana. Hopefully the older version continues to run w/o issue again. :)

As to the re-install on the other systems... Technically, you shouldn't need to delete and re-install anything in the Powerwall-Dashboard folder. Instead, after you remove and install Docker again, you should be able to go into the existing Powerwall-Dashboard directory and run:

./compose-dash.sh up -d

prussell69 commented 4 months ago

As an FYI, I downgraded both of my systems back to 4.24.2 and it looks like ALL of the issues with it stopping have gone away. they have been running for over 2 days without a hiccup. I guess I'll stay with this version for now. The only thing I've noticed, which may have been this way for a bit, is pypowerwall's CPU usage jumps to over 70% for about 1 minute, then drops back down to < 0.02%. It seems to happen hourly. I haven't updated to v4.0.4 yet, so maybe you've already addressed this.

jasonacox commented 4 months ago

Thanks @prussell69 ! I don't know what Docker keeps doing with their desktop software but this isn't the first time we have seen this.

As to the CPU load, I would want to see what is happening in the pypowerwall logs during that time. If you are running the older version (before v3.0.8) you should upgrade immediately. Otherwise, I don't think there has been any changes to address CPU spike.

zi0r commented 3 months ago

I still see some string data (and alerts) via /api/solar_powerwall. However, this only seems to show 1/2 of the inverters (Powerwall+). Perhaps there's a way to get the status/alerts from the second one via /api/solar_powerwall/something.

I'm currently on 24.4.0 0fe780c9.

{ "pvac_status": { "state": "PVAC_Active", "disabled": false, "disabled_reasons": [], "grid_state": "Grid_Compliant", "inv_state": "INV_Grid_Connected", "v_out": 247.60000000000002, "f_out": 59.980000000000004, "p_out": 2020, "q_out": 30, "i_out": 7.76, "string_vitals": [ { "string_id": 1, "connected": true, "measured_voltage": 249.3, "current": 1.62, "measured_power": 408 }, { "string_id": 2, "connected": true, "measured_voltage": 270.1, "current": 1.82, "measured_power": 476 }, { "string_id": 3, "connected": true, "measured_voltage": 266.7, "current": 1.46, "measured_power": 388 }, { "string_id": 4, "connected": true, "measured_voltage": 315.8, "current": 2.66, "measured_power": 838 } ] }, "pvs_status": { "state": "PVS_Active", "disabled": false, "enable_output": true, "v_ll": 247.5, "self_test_state": "PVS_SelfTestOff" }, "pv_power_limit": 2859.088662175421, "power_status_setpoint": "on", "pvac_alerts": { "LastRxTime": "2024-03-16T16:07:09.334084-04:00", "ReceivedMuxBitmask": 1, "PVAC_alertMatrixIndex": 0, "PVAC_a001_inv_L1_HW_overcurrent": false, "PVAC_a002_inv_L2_HW_overcurrent": false, "PVAC_a003_inv_HVBus_HW_overvoltage": false, "PVAC_a004_pv_HW_CMPSS_OC_STGA": false, "PVAC_a005_pv_HW_CMPSS_OC_STGB": false, "PVAC_a006_pv_HW_CMPSS_OC_STGC": false, "PVAC_a007_pv_HW_CMPSS_OC_STGD": false, "PVAC_a008_inv_HVBus_undervoltage": false, "PVAC_a009_SwAppBoot": false, "PVAC_a010_inv_AC_overvoltage": false, "PVAC_a011_inv_AC_undervoltage": false, "PVAC_a012_inv_AC_overfrequency": false, "PVAC_a013_inv_AC_underfrequency": false, "PVAC_a014_PVS_disabled_relay": false, "PVAC_a015_pv_HW_Allegro_OC_STGA": false, "PVAC_a016_pv_HW_Allegro_OC_STGB": false, "PVAC_a017_pv_HW_Allegro_OC_STGC": false, "PVAC_a018_pv_HW_Allegro_OC_STGD": false, "PVAC_a019_ambient_overtemperature": false, "PVAC_a020_dsp_overtemperature": false, "PVAC_a021_dcac_heatsink_overtemperature": false, "PVAC_a022_mppt_heatsink_overtemperature": false, "PVAC_a023_unused": false, "PVAC_a024_PVACrx_Command_mia": false, "PVAC_a025_PVS_Status_mia": false, "PVAC_a026_inv_AC_peak_overvoltage": false, "PVAC_a027_inv_K1_relay_welded": false, "PVAC_a028_inv_K2_relay_welded": false, "PVAC_a029_pump_faulted": false, "PVAC_a030_fan_faulted": false, "PVAC_a031_VFCheck_OV": false, "PVAC_a032_VFCheck_UV": false, "PVAC_a033_VFCheck_OF": false, "PVAC_a034_VFCheck_UF": false, "PVAC_a035_VFCheck_RoCoF": false, "PVAC_a036_inv_lost_iL_control": false, "PVAC_a037_PVS_processor_nERROR": false, "PVAC_a038_inv_failed_xcap_precharge": false, "PVAC_a039_inv_HVBus_SW_overvoltage": false, "PVAC_a040_pump_correction_saturated": false, "PVAC_a041_excess_PV_clamp_triggered": false, "PVAC_a042_mppt_curve_scan_completed": false, "PVAC_a043_fan_speed_mismatch_detected": false, "PVAC_a044_fan_deadband_toggled": false, "PVAC_a045_max_thermal_current_exceeded": false }, "pvs_alerts": { "LastRxTime": "2024-03-16T16:07:09.903665-04:00", "ReceivedMuxBitmask": 0, "PVS_a001_WatchdogReset": false, "PVS_a002_SW_App_Boot": false, "PVS_a003_V12vOutOfBounds": false, "PVS_a004_V1v5OutOfBounds": false, "PVS_a005_VAfdRefOutOfBounds": false, "PVS_a006_GfOvercurrent300": false, "PVS_a007_V12vPowerOutOfBounds": false, "PVS_a008_UNUSED_8": false, "PVS_a009_GfOvercurrent030": false, "PVS_a010_PvIsolationTotal": false, "PVS_a011_PvIsolationStringA": false, "PVS_a012_PvIsolationStringB": false, "PVS_a013_PvIsolationStringC": false, "PVS_a014_PvIsolationStringD": false, "PVS_a015_SelfTestGroundFault": false, "PVS_a016_ESMFault": false, "PVS_a017_MciStringA": false, "PVS_a018_MciStringB": false, "PVS_a019_MciStringC": false, "PVS_a020_MciStringD": false, "PVS_a021_RapidShutdown": false, "PVS_a022_Mci1SignalLevel": false, "PVS_a023_Mci2SignalLevel": false, "PVS_a024_Mci3SignalLevel": false, "PVS_a025_Mci4SignalLevel": false, "PVS_a026_Mci1PvVoltage": false, "PVS_a027_Mci2PvVoltage": false, "PVS_a028_systemInitFailed": false, "PVS_a029_PvArcFault": false, "PVS_a030_VDcOv": false, "PVS_a031_Mci3PvVoltage": false, "PVS_a032_Mci4PvVoltage": false, "PVS_a033_dataException": false, "PVS_a034_PeImpedance": false, "PVS_a035_PvArcDetected": false, "PVS_a036_PvArcLockout": false, "PVS_a037_PvArcFaultData1": false, "PVS_a038_PvArcFault_SelfTest": false, "PVS_a039_SelfTestRelayFault": false, "PVS_a040_LEDIrrationalFault": false, "PVS_a041_MciPowerSwitch": false, "PVS_a042_MciPowerFault": false, "PVS_a043_InactiveUnsafePvStrings": false, "PVS_a044_FaultStatePvStringSafety": false, "PVS_a045_RelayCoilIrrationalFault": false, "PVS_a046_RelayCoilIrrationalLockout": false, "PVS_a047_AcSensorIrrationalFault": false, "PVS_a048_DcSensorIrrationalFault": false, "PVS_a049_arcSignalMibspiHealth": false, "PVS_a050_RelayCoilIrrationalWarning": false, "PVS_a051_DcBusShortCircuitDetected": false, "PVS_a052_PvArcFault_PreSelfTest": false, "PVS_a053_PvArcFaultData2": false, "PVS_a054_PvArcFaultData3": false, "PVS_a055_PvArcFaultData4": false, "PVS_a056_PvIsolation24HrLockout": false, "PVS_a057_DisabledDuringSelftest": false, "PVS_a058_MciOpenOnFault": false, "PVS_a059_MciOpen": false, "PVS_a060_MciClose": true, "PVS_a061_SelfTestRelayFaultLockout": false, "PVS_a062_arcSoftLockout": false, "PVS_a063_sbsComplete_info": false } }

jasonacox commented 3 months ago

@zi0r That's great! Can you explain your setup a bit more? Do you recall what you saw before your Firmware upgraded in terms of string data?

zi0r commented 3 months ago

@zi0r That's great! Can you explain your setup a bit more? Do you recall what you saw before your Firmware upgraded in terms of string data?

For /api/solar_powerwall, I believe I remember seeing the same string data that I pasted above. It's unclear to me how to access the string data for the other inverter, though, I do know string data for both Powerwall+ was available via the now-removed vitals call.

I'd assume that each inverter would/should have its own set of alerts--it wouldn't make sense that this would be based on the gateway. So, there must be a call to reach the data for the second Powerwall+ (inverter).

2x Powerwall+: 4 strings on one, 3 strings on the second 2x Powerwall

jasonacox commented 3 months ago

It works on my older 23.36.4 (I guess I'm lucky since it still has vitals, but also a bit at a disadvantage to help troubleshoot 🤷 ).

I see this API listed on https://github.com/vloschiavo/powerwall2 :

I only have 4 strings so they are all on one PW+. But I figure it would still have a way to pin it to the one PW+. I've tried various combinations of the device identifiers for ${n} and different version of /api/solar_powerwall/something as you mention without any success so far. The onlything that works is http://localhost:8675/api/solar_powerwall/. If anyone figures this out, please chime in. This would at least let us recover the string data and related alerts.

DerickJohnson commented 3 months ago

Oh nice! I didn't know about the /api/solar_powerwall endpoint. It looks like it actually enumerates all the alerts whereas the previous alerts only showed the active ones. I tested on my setup and I could only see the /solar_powerwall endpoint as well (no combos I tried worked to get the other inverter). I would have assumed it would aggregate at the root and then give specifics for the /${n} versions. @jasonacox it looks these endpoints might have been older (from the repo you linked, they were added 3 years ago using some kind of powerwall API extractor). It's possible they aren't actively updated, but they do still show some alerts. If the tesla pros info doesn't work out, this could be at least some info for those that need alert data for dashboards. For instance, if vitals aren't detected, this could be a fallback for /alerts with something like:

        alerts = []
        data = self.poll('/api/solar_powerwall', jsonformat=True)

        for alert, value in data["pvac_alerts"].items():
            if value is True:
                alerts.append(alert)

        for alert, value in data["pvs_alerts"].items():
            if value is True:
                alerts.append(alert)

To get the original format:

I just did a quick local test to see if it would work. I haven't tested through to the dashboards yet, but if that's something of interest, I could do more testing.

zi0r commented 3 months ago

Any idea what the ${n} variable might mean? I assume the different letters have some meaning.

I suppose it's also possible that there isn't a way to access the second powerwall+ this way. It was my understanding that, originally, a Powerwall+ system only supported 1 additional Powerwall and that you couldn't have two Powerwall+.

During the installation, this was a hurdle they had to overcome. So, perhaps when they 'made it work,' they didn't update the API call to support it.

If anyone comes up with ${n} substitutions to try, I'm happy to give it a go.

I've also noticed that in 24.4.x, the installer login no longer works and the 'toggle auth' appears to do nothing.

jasonacox commented 3 months ago

I've also noticed that in 24.4.x, the installer login no longer works and the 'toggle auth' appears to do nothing.

Just to clarify (since I'm sill on the older Firmware) - Do you mean that you can log in to the portal with customer creditials but there is no longer an installer login? I suppose that would make sense with the Tesla Pros app now released which is for this purpose.

On ${n} I tried serial numbers and a few other things, no results.

As @DerickJohnson mentions, it is dated so likely not updated (and possibly at risk of being removed). However, I like where you are going with this @DerickJohnson . I can add that logic to pypowerwall so that if vitals are not available, we at least get some alerts flowing. I won't have time this weekend, but if you want to submit a PR for that in https://github.com/jasonacox/pypowerwall/blob/11fd52d64a1f9305fe7a376d63fc8966a11279ae/pypowerwall/__init__.py#L676 I would be happy to review / merge. 😉

zi0r commented 3 months ago

Correct--it shows 'Installer' as an option and appears to let you start the toggle auth process. However, toggling power on a powerwall does nothing.

You also have to navigate to /summary to get around the tesla pros announcement.

DerickJohnson commented 3 months ago

@jasonacox ready to be reviewed! https://github.com/jasonacox/pypowerwall/pull/75/files

prussell69 commented 3 months ago

I just installed V.4.1.3. It looks like the string data has returned for the most part. I will monitor it to see if everything, or as much as possible has returned. Thanks to everyone for finding a solution!!

jasonacox / Powerwall-Dashboard