sidoh / esp8266_milight_hub

Replacement for a Milight/LimitlessLED hub hosted on an ESP8266
MIT License
931 stars 219 forks source link

/Settings endpoint returns half the required data #822

Open theartcher opened 6 months ago

theartcher commented 6 months ago

Describe the bug

FYI: I have looked through the 'troubleshooting' and 'issues' tab and could not find a fitting/duplicate issue.

When attemping to open the settings in the webUI/rest API the content returned is incomplete. All other endpoints are working fine.

image image

JSON returned:

{"admin_username":"","admin_password":"","ce_pin":4,"csn_pin":15,"reset_pin":0,"led_pin":-2,"radio_interface_type":"nRF24","packet_repeats":50,"http_repeat_factor":1,"auto_restart_period":0,"mqtt_server":"","mqtt_username":"","mqtt_password":"","mqtt_topic_pattern":"","mqtt_update_topic_pattern":"","mqtt_state_topic_pattern":"","mqtt_client_status_topic":"","simple_mqtt_client_status":false,"discovery_port":48899,"listen_repeats":3,"state_flush_interval":10000,"mqtt_state_rate_limit":500,"mqtt_debounce_delay":500,"mqtt_retain":true,"packet_repeat_throttle_sensitivity":0,"packet_repeat_throttle_threshold":200,"packet_repeat_minimum":3,"enable_automatic_mode_switching":false,"led_mode_wifi_config":"Fast toggle","led_mode_wifi_failed":"On","led_mode_operating":"Slow blip","led_mode_packet":"Flicker","led_mode_packet_count":3,"hostname":"milight-hub","rf24_power_level":"MAX","rf24_listen_channel":"LOW","wifi_static_ip":"","wifi_static_ip_gateway":"","wifi_static_ip_netmask":"","packet_repeats_per_loop":10,"home_assistant_discovery_prefix":"","wifi_mode":"g","d"

As you can see the JSON is incomplete and thus I cannot properly read/edit my settings.

Steps to reproduce

  1. Power on ESP
  2. Open WebUI / Go to (my hub url)/settings
  3. Wait for response
  4. Response is incomplete / (current-) settings are empty

Expected behavior

When opening the settings tab, the current settings should be displayed in the required fields. When 'get'-ing the data from the /settings endpoint the data should return all the current settings in a complete JSON structure.

Setup information

Firmware version

1.11.1-nodemcuv2 1.11.2-nodemcuv2

Output of http://milight-hub.local/about

{"firmware":"milight-hub","version":"1.11.2","ip_address":"192.168.22.105","reset_reason":"Software/System restart","variant":"nodemcuv2","free_heap":20336,"arduino_version":"3.1.2","free_stack":1792,"flash_used":12299,"flash_total":957314,"flash_pct_free":98,"queue_stats":{"length":0,"dropped_packets":0}}

Output of http://milight-hub.local/settings

Incomplete since that is the bug I am encountering.

{"admin_username":"","admin_password":"","ce_pin":4,"csn_pin":15,"reset_pin":0,"led_pin":-2,"radio_interface_type":"nRF24","packet_repeats":50,"http_repeat_factor":1,"auto_restart_period":0,"mqtt_server":"","mqtt_username":"","mqtt_password":"","mqtt_topic_pattern":"","mqtt_update_topic_pattern":"","mqtt_state_topic_pattern":"","mqtt_client_status_topic":"","simple_mqtt_client_status":false,"discovery_port":48899,"listen_repeats":3,"state_flush_interval":10000,"mqtt_state_rate_limit":500,"mqtt_debounce_delay":500,"mqtt_retain":true,"packet_repeat_throttle_sensitivity":0,"packet_repeat_throttle_threshold":200,"packet_repeat_minimum":3,"enable_automatic_mode_switching":false,"led_mode_wifi_config":"Fast toggle","led_mode_wifi_failed":"On","led_mode_operating":"Slow blip","led_mode_packet":"Flicker","led_mode_packet_count":3,"hostname":"milight-hub","rf24_power_level":"MAX","rf24_listen_channel":"LOW","wifi_static_ip":"","wifi_static_ip_gateway":"","wifi_static_ip_netmask":"","packet_repeats_per_loop":10,"home_assistant_discovery_prefix":"","wifi_mode":"g","d

Additional context

sidoh commented 6 months ago

I think this is probably the same issue as this:

https://github.com/sidoh/esp8266_milight_hub/issues/807

Never really figured out exactly what was going on, but it seemed to be related to a particular version of a flash chip being incompatible (less compatible) with some combination of newer version of the Arduino SDK and how I'm using it.

You might try a different ESP8266. If that doesn't work, I believe 1.10 and lower were still working for the folks in that thread.

If you have any deeper insight into what's going on here it'd be much appreciated!

theartcher commented 6 months ago

Ah, I seem to have missed that, my apologies. I appreciate your fast reply.

I'm afraid you know more here than me, I work in software but not so much in embedded/hardware related software, more web/app development. I do notice oddly enough that the first /settings call after a restart ALWAYS seems to work.

If you then refresh it yet again breaks, though I do find it odd to be something related to connectivity, that would imply there's something random about it, but it breaks EXTREMELY consistently. Always after the "wifi_mode" property in the returned JSON. If it was connectivity I'd expect more randomness to when the data fails to return, perhaps it has something to do with how the connection is handled?

The "NS_ERROR_NET_PARTIAL_TRANSFER" I get is not because the connection was lost or forcefully closed, it just means the received bytes =/= the content's length.

I'll try 1.10 soon if it doesn't help I'll need to get a new ESP8266, I am developing an app to accompany this project as a challenge to create a proper open source project but then it'd be handy to be able to run the newest version.

theartcher commented 6 months ago

PS: 1.10.9 seems to indeed be stable, no issues regarding the /settings endpoint here.

SelverC commented 6 months ago

I'll write here as well!

I did buy a new one but it didn't help. So I tried to manually to transfer the settings(Seems like something in the conversion didn't work, between the two version there is a change between backup settings).

And one more change that I did is to re-wire it(my cables were not correct but it worked anyway? maybe that caused the crash as well).

Not sure exactly what helped me but I would start fresh don't backup settings and make sure all the wires are correct

Update: after a reboot the issue returned so still no solution for me :-(

theartcher commented 6 months ago

And one more change that I did is to re-wire it(my cables were not correct but it worked anyway? maybe that caused the crash as well).

I doubt that's the issue I am running into, I rechecked my wires 3x, they are correct and if it was the wires the problem would persist regardless of software running on said ESP. I do start fresh multiple times, and I do not back up/restore the settings as well, the settings endpoint does not work for me, so I cannot edit said values.

marcin1965 on the other thread tried using a NodeMCU v3 board and that fixed his issues. Perhaps an idea?

SelverC commented 6 months ago

And one more change that I did is to re-wire it(my cables were not correct but it worked anyway? maybe that caused the crash as well).

I doubt that's the issue I am running into, I rechecked my wires 3x, they are correct and if it was the wires the problem would persist regardless of software running on said ESP. I do start fresh multiple times, and I do not back up/restore the settings as well, the settings endpoint does not work for me, so I cannot edit said values.

marcin1965 on the other thread tried using a NodeMCU v3 board and that fixed his issues. Perhaps an idea?

Seems like my issue returned as-well after a reboot. Could you share your settings? Maybe it's something in the settings(please remove all sensitive parts)

I'm using a nodemcuv3(didn't help me)

theartcher commented 6 months ago
{"admin_username":"","admin_password":"","ce_pin":4,"csn_pin":15,"reset_pin":0,"led_pin":-2,"radio_interface_type":"nRF24","packet_repeats":50,"http_repeat_factor":1,"auto_restart_period":0,"mqtt_server":"","mqtt_username":"","mqtt_password":"","mqtt_topic_pattern":"","mqtt_update_topic_pattern":"","mqtt_state_topic_pattern":"","mqtt_client_status_topic":"","simple_mqtt_client_status":false,"discovery_port":48899,"listen_repeats":3,"state_flush_interval":10000,"mqtt_state_rate_limit":500,"mqtt_debounce_delay":500,"mqtt_retain":true,"packet_repeat_throttle_sensitivity":0,"packet_repeat_throttle_threshold":200,"packet_repeat_minimum":3,"enable_automatic_mode_switching":false,"led_mode_wifi_config":"Fast toggle","led_mode_wifi_failed":"On","led_mode_operating":"Slow blip","led_mode_packet":"Flicker","led_mode_packet_count":3,"hostname":"milight-hub","rf24_power_level":"MAX","rf24_listen_channel":"LOW","wifi_static_ip":"","wifi_static_ip_gateway":"","wifi_static_ip_netmask":"","packet_repeats_per_loop":10,"home_assistant_discovery_prefix":"","wifi_mode":"g","default_transition_period":500,"rf24_channels":["LOW","MID","HIGH"],"device_ids":[1],"gateway_configs":[],"group_state_fields":["state","brightness","mode","color_temp","computed_color","color_mode"]}

I am running default settings, I think this is the correct JSON but can't validate as I currently don't have access to the ESP.

The one thing I am confused about reading both threads is the consistency of it all:

1 - Hardware likely isn't an issue since it breaks consistently at the same JSON property (for me). If it was hardware related I think it'd be more random where the returned/read data breaks.

2 - Software also isn't likely as it DOES work the very first call made to the settings, implying it CAN access, read and send that data. It'd be highly unlikely for it to just "break" after one try as the first call (read) should be identical to all the others.

3 - Connectivity seems to be influenced, I notice if I keep pinging the ESP whilst making the request on another device/browser, I only lose 1 ping in my terminal when the /settings call is made.

If 1.10.9 does work, what changed exactly in the next version and how could it have influenced the connectivity, settings and/or method of reading/sending data?

sidoh commented 6 months ago

The biggest change in the 1.11 release was upgrading the arduino SDK to the most recent release. I had been using an SDK release from circa 2017 which was becoming incompatible with newer versions of libraries like ArduinoJson.

In https://github.com/sidoh/esp8266_milight_hub/issues/807, I think we narrowed the issue down to calling any web route that modifies the settings (GET /settings saves the in-memory settings to flash before serving the file). Some combination of variables is causing the process of saving settings to flash to short-circuit or crash or something, but it does seem like there's some kind of hardware variable. In that same issue I linked, one person who was able to check noticed a board having trouble had a flash manufacturer of 0x50 while a working board had 0x5E (all of my working board were also 0x5E -- you can test this with esptool flash_id).

A big reason this has been so tricky to pinpoint is that I cannot reproduce it. I tried every spare ESP8266 I had (on the order of a dozen or so) and never saw this problem.

Very happy to dig more at this, but it will be much easier if we can find a way for me to reproduce it.

SelverC commented 6 months ago

The biggest change in the 1.11 release was upgrading the arduino SDK to the most recent release. I had been using an SDK release from circa 2017 which was becoming incompatible with newer versions of libraries like ArduinoJson.

In #807, I think we narrowed the issue down to calling any web route that modifies the settings (GET /settings saves the in-memory settings to flash before serving the file). Some combination of variables is causing the process of saving settings to flash to short-circuit or crash or something, but it does seem like there's some kind of hardware variable. In that same issue I linked, one person who was able to check noticed a board having trouble had a flash manufacturer of 0x50 while a working board had 0x5E (all of my working board were also 0x5E -- you can test this with esptool flash_id).

A big reason this has been so tricky to pinpoint is that I cannot reproduce it. I tried every spare ESP8266 I had (on the order of a dozen or so) and never saw this problem.

Very happy to dig more at this, but it will be much easier if we can find a way for me to reproduce it.

Forgot to add my flash_id it is a bit strange... My old witty esptool.py v4.6.2 Serial port /dev/cu.usbserial-1410 Connecting.... Detecting chip type... Unsupported detection protocol, switching and trying again... Connecting.... Detecting chip type... ESP8266 Chip is ESP8266EX Features: WiFi Crystal is 26MHz MAC: a0:20:a6:17:c7:d6 Uploading stub... Running stub... Stub running... Manufacturer: e0 Device: 4016 Detected flash size: 4MB Hard resetting via RTS pin...

My new Nodemcu v3(https://www.amazon.co.uk/dp/B06Y1ZPNMS): esptool.py v4.6.2 Serial port /dev/cu.usbserial-1410 Connecting.... Detecting chip type... Unsupported detection protocol, switching and trying again... Connecting.... Detecting chip type... ESP8266 Chip is ESP8266EX Features: WiFi Crystal is 26MHz MAC: 34:94:54:97:5a:a8 Uploading stub... Running stub... Stub running... Manufacturer: 46 Device: 4016 Detected flash size: 4MB Hard resetting via RTS pin...

SelverC commented 6 months ago

@sidoh One more thing! Could you try my backup file on your ESP and se if it works, I have atteched it here espmh-backup-without-pw.bin.zip ? I have only edited the password for MQTT

sidoh commented 6 months ago

Happy to give it a shot. I'm away from home for the next week or so no access to my ESPs, but will report back when I return!

theartcher commented 6 months ago

The biggest change in the 1.11 release was upgrading the arduino SDK to the most recent release. I had been using an SDK release from circa 2017 which was becoming incompatible with newer versions of libraries like ArduinoJson.

In #807, I think we narrowed the issue down to calling any web route that modifies the settings (GET /settings saves the in-memory settings to flash before serving the file). Some combination of variables is causing the process of saving settings to flash to short-circuit or crash or something, but it does seem like there's some kind of hardware variable. In that same issue I linked, one person who was able to check noticed a board having trouble had a flash manufacturer of 0x50 while a working board had 0x5E (all of my working board were also 0x5E -- you can test this with esptool flash_id).

A big reason this has been so tricky to pinpoint is that I cannot reproduce it. I tried every spare ESP8266 I had (on the order of a dozen or so) and never saw this problem.

Very happy to dig more at this, but it will be much easier if we can find a way for me to reproduce it.

Here's my esptool flash_id returned data:

Found 3 serial ports
Serial port COM5
Connecting....
Detecting chip type... Unsupported detection protocol, switching and trying again...
Connecting....
Detecting chip type... ESP8266
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: c8:c9:a3:7b:30:ef
Uploading stub...
Running stub...
Stub running...
Manufacturer: 5e
Device: 4016
Detected flash size: 4MB
Hard resetting via RTS pin...

I am at a loss of what could be the issue. I'd love to play around with my settings and see if that does anything but well, can't quite access it... I will await your attemps @sidoh. Let's hope that breaks something. (Sounds weird saying that)

SelverC commented 6 months ago

Happy to give it a shot. I'm away from home for the next week or so no access to my ESPs, but will report back when I return!

Thanks! But I tried without any settings and without any wires and still behaves the same....so it seems it's not the settings but the hardware not working with the software(SDK or something).

Will return my new ESP and try to find another

SelverC commented 6 months ago

@sidoh Latest guess is that I have ESP8266MOD chip on both of mine(old and new). And as far as I can tell that is called sometimes esp-12F? Seems like a bit different then the other ones, so maybe a compiled version for that chip? I tried myself with some hints from other github issues in other projects with the same chip without luck.

Update: I did see this, seems a little bit same to me(something disconnecting the Wifi on specific chips) 1.) https://forum.arduino.cc/t/esp-12f-intermittent-problems/522806/8 2.) https://github.com/platformio/platform-espressif8266/issues/274

@theartcher Do you as well have a ESP8266MOD chip?

theartcher commented 5 months ago

Excuse my late reply, I had other things taking priority. I indeed also have a ESP-12F.

theartcher commented 5 months ago

Happy to give it a shot. I'm away from home for the next week or so no access to my ESPs, but will report back when I return!

@sidoh Have you had the time to try this?

SelverC commented 5 months ago

Happy to give it a shot. I'm away from home for the next week or so no access to my ESPs, but will report back when I return!

@sidoh Have you had the time to try this?

Won't help as I have confirmed it's not the settings, it has something to do with the chip itself. I ordered a ESP-12E chip and it worked fine after.

So guess @sidoh would need to check if there is something he can do about the specific 12F chips(but I guess it isn't worth it)

theartcher commented 4 months ago

Update!

Although I have not yet been able to replicate the exact circumstances: I have bought a new ESP8266, a NodeMcu v3 ESP-12E. I have had 2 weird types of behaviour; the first being the /settings endpoint yet again returning incomplete data. Next to that I've had the issue of the aliases just completely being wiped. Seems there's something wrong with reading the flash memory. I doubt it's related to it being an 12E or 12F, as that should imply the issues should not occur here anymore.