revk / ESP32-Faikin

ESP32 based module to control Daikin aircon units
GNU General Public License v3.0
445 stars 65 forks source link

[BUG] faikin unit stops responding on the network #211

Closed Jerther closed 8 months ago

Jerther commented 9 months ago

Faikin hardware Bought on tindie.com. Product received is what's shown on the product page.

Daikin hardware FTX12NMVJUA Direct connection on S403

Describe the bug After a while, the faikin unit disappears from the network.

To Reproduce Steps to reproduce the behavior:

  1. Setup the faikin unit, connect it to local wifi with internet, let it update itself.
  2. Let it run for a few days.
  3. Eventually the faikin unit stops responding on the network

Expected behavior The unit stays stable, or reboots itself in case of problem.

Additional context

revk commented 9 months ago

What version of the code are you running?

Jerther commented 9 months ago

Faikin-S3-MINI-N4-R2: 486973e 2024-01-31T09:11:18

The faikin unit stopped responding again while everybody was sleeping last night. It worked fine for about 5h30m after being rebooted. I had to reboot it again.

revk commented 9 months ago

OK that is rather worrying. Do you use the "legacy" daikin URLs from some system - that is where we had and issue before.

Jerther commented 9 months ago

I think I do:

/aircon/get_control_info
/aircon/set_control_info
/aircon/get_sensor_info
revk commented 9 months ago

OK this is a big clue, there have been some changes. I'll investigate that as it could be a stack or heap issue. Leave with me.

Jerther commented 9 months ago

Thank you very much!

Oh I remember something. While I was programming the REST calls to these URLS, I noticed if I made two calls close to each other every 2 seconds (one to get_control_info and the other to get_sensor_info), eventually the faikin unit would stop responding on the network. I can't say for sure if the 2s delay was actually 2s and this was very random though. But the problem did not happen again after I changed the delay of get_sensor_info to 30s. Until yesterday, and last night.

So right now the call frequence is as follow: get_control_info: every 2 seconds set_control_info: on demand (less than 5 times a day) get_sensor_info: every 30 seconds

Let me know if I can help any further.

revk commented 9 months ago

What does the settings page say for free memory? Is that gradually going down?

Jerther commented 9 months ago

I'll have a serious look tonight (it's 9:00 AM here) ;)

revk commented 9 months ago

Hammering them I am not seeing memory go down. See what you can see. Maybe try latest beta release.

Jerther commented 9 months ago

I just got home!

Uptime  0 days 11:37:08
Free mem    125740+2090244

No communication interruption for the uptime

Jerther commented 9 months ago

About an hour and a half later. Looks like there's even more free ram!

Uptime  0 days 12:52:10
Free mem    122996+2090244

No interruption.

I'll report back again tomorrow morning! (about 12 hours)

revk commented 9 months ago

Going up and down a bit is fine, a general trend down is a memory leak, which is most likely symptom of "stopping after a while". The other is stack too small causing random issues.

Jerther commented 9 months ago
Uptime  1 day 00:02:33
Free mem    122888+2090244

Not much going on apparently

No interruption last night.

revk commented 9 months ago

The beta has a load of library changes, so could be that it has fixed it.

Jerther commented 9 months ago

Alright I'll try that next. Have the "legacy" urls changed in the beta?

revk commented 9 months ago

Not as such, but if it was a memory issue the memory footprint will definitely have changed.

Jerther commented 9 months ago

I just did a firmware upgrade through the UI. The version is now Faikin-S3-MINI-N4-R2: 41ae52f 2024-02-02T10:13:27 S21

Is it recent enough?

revk commented 9 months ago

So is it working now?

Jerther commented 9 months ago

Still working yes

Uptime  0 days 21:36:49
Free mem    118044+2093720

It may take days before it fails again, if it ever does. I'll follow this closely and report back in, let's say a week max.

Jerther commented 9 months ago

It went down again, 7 hours ago. So total uptime was about 45 hours.

Anything I can do before I turn the AC off and on again? I'd like to check if the LED changes color, so I could rule out the ESP being frozen. Not sure how to do that, like I said in the original post.

revk commented 9 months ago

That's a bugger. I'll review stack usage again. I thought the latest changes should reduce stack though. So worrying.

Jerther commented 9 months ago

I have a personal project with an ESP32 and it behaves the same: works fine for a while but randomly stops responding on the network. Sometimes it goes down after a few seconds, and sometimes it takes weeks.

After months of debugging, I've been able to determine the ESP does not crash. It's juste the wifi that fails for some reason. I use tzapu's WifiManager for this project, and the wifi state is WL_CONNECTED when it's connected and working, but it would randomly drop to WL_IDLE_STATUS and get stuck there until power cycled (haven't tried software reboot). On occasion the ESP would connect to the wifi network but immediately drop to WL_NO_SSID_AVAIL. I have yet to dig further.

I have other projects with ESP8266 and they work fine. One has been on for years.

Both ESP32 modules might have a problem with my specific Wifi configuration, I don't know. That's why I'd like to rule that out with the Faikin module using its LED, if that's possible.

Jerther commented 8 months ago

The faikin module came back on the network on its own! Looks like the uptime kept going:

Uptime  2 days 21:32:31
Free mem    115496+2093720
revk commented 8 months ago

Wow, OK, that is special, maybe http or DHCP stuff. Hmmm.

revk commented 8 months ago

Latest beta has slightly more stack if you want to try it.

Jerther commented 8 months ago

Will do!

Version is now Faikin-S3-MINI-N4-R2: 065ca10 2024-02-05T14:27:06

Jerther commented 8 months ago

It's down again. After about 7 days. It's been down for about 6 hours now.

Automatic updates are on, so while it was working I noticed it had rebooted at some point, but I can't tell what version it was on when it failed.

Anything I could try before I power cycle the AC?

Jerther commented 8 months ago

Oh no. It's playing tricks on me. It's back online.

Faikin-S3-MINI-N4-R2: 9d21885 2024-02-10T16:47:49 S21
Uptime  0 days 19:32:20
Free mem    126664+2093688

All I did was to try to access the web interface with Chrome, see it fail but leave the tab open while I wrote my previous post. I have no idea what's going on. The legacy URLs were definately offline in the last 7 hours.

revk commented 8 months ago

There may be tcp tweaks I can change in the config. I'll have to look.

revk commented 8 months ago

I have changed an HTTP setting, in case that helps, try latest beta (in a few minutes).

Jerther commented 8 months ago

Faikin-S3-MINI-N4-R2: 2d94759 2024-02-12T09:42:30 S21

We'll see.

Jerther commented 8 months ago

Gone again.

Chrome's error is ERR_ADDRESS_UNREACHABLE

Like I said I have a personal project with an ESP32 that behaves pretty much the same: it would randomly disapear fro the network. I've been diagnosing this for months now and recently I've been able to pin down the problem to the wifi disconnecing for some reason and throwing a bunch of ARDUINO_EVENT_WIFI_STA_DISCONNECTED (5) events, then eventually ARDUINO_EVENT_WPS_ER_SUCCESS (9). I have no idea why it would use WPS.

Sometimes after the bunch of ARDUINO_EVENT_WIFI_STA_DISCONNECTED (5) i get a ARDUINO_EVENT_WIFI_STA_CONNECTED and the ESP reconnects seemlessly.

To be continueed I guess...

conorlmcbride commented 8 months ago

I'm also having this issue. I'm on the latest non-beta firmware, I'll update to beta firmware tonight and see how that works. I will also try the legacy daikin urls

FTXS09LVJU on S21

revk commented 8 months ago

Let me know.

Jerther commented 8 months ago

Apparently the ESP32 has/had problems with Wifi Multimedia Mode (WMM) on a number of access points, and I think it's a Wireless N mode thing so I switched the mode on my TP-Link EAP245 from B/G/N to B/G only.

We'll see...

Jerther commented 7 months ago

So far so good! And my other ESP32 project has not had any problem since either. I'll consider this fixed for now.

Jerther commented 7 months ago

The faikin module went off last night. I let it try to get back on by itself for 20 hours but it didn't. So I had to power cycle the unit.

There might be multiple issues with the Wifi here.

Honnestly, I wouldn't mind if the Faikin unit rebooted itself when some conditions were met. That would certainly not fix the problem, but it would certainly work around it.

revk commented 7 months ago

The "multiple issues with WiFi" may be the clue. The "non response" was, I believe, a simple matter of stack usage on the "legacy" URLs. The stack has been increased and the handling code rewritten to address this.

The Faikin should not need power cycle, it constantly tries to connect to WiFi.

Jerther commented 7 months ago

If it constantly retries, is there a way to know why it constantly fails? A log of some kind? That'd help.

revk commented 7 months ago

Well, a serial log could be connected to the pads on the back. You may need a debug build though to see the low level wifi issues, assuming the issue is wifi and not DHCP.

Jerther commented 7 months ago

Sure! I can do that. I'm guessing it's these pads, right?

T = UART tx
R = UART rx
+ = 3.3v
0 = GND
GND = GND 
revk commented 7 months ago

0 is GPIO0 (used for boot mode) so can be left unconnected. If you GND it, it will go in to boot loader mode!

Jerther commented 7 months ago

Oh 😅

Alright, I'll hook a serial monitor onto this. I'll let you know when it's done.

Oh one last thing, how many milliamps are left on the 3.3v rail? I don't want to blow the regulator 😉

revk commented 7 months ago

Err, well not designed to power anything else, the 3V3 is normally the power input when serial programming, and an ESP32 can use something like 500mA peaks when talking WiFi I think. The regulator on board is rated 600mA I think. Normal average usage is way less, obviously.

Try not to blow yourself up!

Jerther commented 7 months ago

Alright I'll find an external way to power the serial monitor then.

Yeah, I'll be extra cautious. The module is directly hooked to the S403 port ;)

revk commented 7 months ago

Isn't S403 "live"?!

Jerther commented 7 months ago

Yes it is!

I always turn the whole unit off while working in there, and I'm extra cautious in case there are charged caps.

Jerther commented 1 month ago

Hey I just wanted to update this. It's been a while! The Faikin is on autoupdate, and I have not been aware that it disconnected from the wifi again since march. If it did, I guess it recovered very well ;) So all is well now.

Congratulations and thank you for your project. It has made my family home that much more comfortable ;)