Closed ReinierGielen closed 4 years ago
I think the request payload is missing in the restore request
Sounds an awful lot like you uploaded a binary compiled for a different platform. My guess is you're on a d1_mini
and you uploaded d1_mini_pro
or something like that.
I think the request payload is missing in the restore request
It's not, but it might look that way if the ESP8266 is crashing, which might happen if SPIFFS gets resized without a format. This would happen if you upload the wrong binary.
Ik uploaded the nodemcuv2 binary. Ik have a nodemcuv3. This worked fine on previous updates
The only change in 1.10.2 is trivial and wouldn't have caused this kind of behavior:
The only time I've seen the behavior you're describing is when you upload a binary compiled with a ldscript different than the one you had previously. ldscripts are responsible for partitioning the onboard flash into program and SPIFFS space.
You can try wiping the device with esptool.py erase_flash
and re-uploading from scratch. This will probably fix it.
If you can find a reliable way to reproduce this I'm happy to look into it. But I flashed my milight hub install with 1.10.2 last night and haven't had any problems.
update here went ok as well.. first backup my settings for "In case"
Mine lost the mqtt username. But was easily fixed when I found out.
On Thu, 19 Sep 2019, 12:37 poudenes, notifications@github.com wrote:
update here went ok as well.. first backup my settings for "In case"
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sidoh/esp8266_milight_hub/issues/523?email_source=notifications&email_token=AABBSBWPLQPE2KEXS2K452DQKNI5XA5CNFSM4IYC5JU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DAQZQ#issuecomment-533071974, or mute the thread https://github.com/notifications/unsubscribe-auth/AABBSBRL34P7RJXYVB7XZCDQKNI5XANCNFSM4IYC5JUQ .
I re-flashed my nodemcu with physical connection. It started fine. Updates over the web UI also work fine.
The only thing is restoring the backuped settings from web UI is not working in any version.
Is there another way to restore backedup settings ?
Additional information. When I manually re-enter settings like id, password and mqtt settings and saver them they are stored. But when I then restore a backup ( hoping all my devices will be restored ) all settings are ERASED !!!
The content of the settings file:
{"admin_username":"User","admin_password":"Password","ce_pin":4,"csn_pin":15,"reset_pin":0,"led_pin":-2,"radio_interface_type":"nRF24","packet_repeats":50,"http_repeat_factor":1,"auto_restart_period":0,"mqtt_server":"192.168.0.107:1883","mqtt_username":"mqtt","mqtt_password":"mqtt","mqtt_topic_pattern":"milight/:device_id/:device_type/:group_id","mqtt_update_topic_pattern":"milight/updates/:device_id/:device_type/:group_id","mqtt_state_topic_pattern":"milight/states/:device_id/:device_type/:group_id","mqtt_client_status_topic":"","simple_mqtt_client_status":true,"discovery_port":48899,"listen_repeats":3,"state_flush_interval":10000,"mqtt_state_rate_limit":500,"packet_repeat_throttle_sensitivity":0,"packet_repeat_throttle_threshold":200,"packet_repeat_minimum":3,"enable_automatic_mode_switching":false,"led_mode_wifi_config":"Fast toggle","led_mode_wifi_failed":"On","led_mode_operating":"Slow blip","led_mode_packet":"Flicker","led_mode_packet_count":3,"hostname":"milight-hub","rf24_power_level":"MAX","rf24_listen_channel":"LOW","wifi_static_ip":"","wifi_static_ip_gateway":"","wifi_static_ip_netmask":"","packet_repeats_per_loop":10,"home_assistant_discovery_prefix":"homeassistant","wifi_mode":"n","rf24_channels":["LOW","MID","HIGH"],"device_ids":[1,2,3,4,5],"gateway_configs":[],"group_state_fields":["state","brightness","mode","color_temp","bulb_mode","computed_color"],"group_id_aliases":{"Bureau":["rgb_cct",3,3],"Eetkamer":["rgb_cct",3,1],"Gang-Alles":["rgb_cct",5,0],"Gang1":["rgb_cct",5,1],"Gang2":["rgb_cct",5,2],"Gang3":["rgb_cct",5,3],"Gang4":["rgb_cct",5,4],"Keuken-Alles":["rgb_cct",4,0],"Keuken-Eiland":["rgb_cct",4,1],"Keuken-Gootsteen":["rgb_cct",4,3],"Keuken-Kasten":["rgb_cct",4,4],"Keuken-Robin":["rgb_cct",4,2],"Woonkamer-Alles":["rgb_cct",3,0],"Woonkamer-Links":["rgb_cct",2,0],"Woonkamer-Rechts":["rgb_cct",1,0],"Woonkamer1":["rgb_cct",1,1],"Woonkamer2":["rgb_cct",1,2],"Woonkamer3":["rgb_cct",1,3],"Woonkamer4":["rgb_cct",1,4],"Woonkamer5":["rgb_cct",2,1],"Woonkamer6":["rgb_cct",2,2],"Woonkamer7":["rgb_cct",2,3],"Woonkamer8":["rgb_cct",2,4],"Zitkamer":["rgb_cct",3,2]}}
Hmm... this should definitely work. There's a test for it. The web UI is definitely calling the correct API endpoint.
I also just tried doing it through the UI myself, and it seems to be working fine:
Are you able to check if there's anything interesting going on in serial logs when you try to restore settings?
Which logs are you referring to
Serial logs. You can access them with pio device monitor
while the ESP8266 is plugged into USB.
I managed to restore the system by manually inputting the data from the settings file. I don't have pio device monitor installed but I'll attempt to retreive the logs. I will not make another attempt to use the restore option as this erases the settings and upsets my wife.
Except for the restore your project works briljantly
it shouldn't really matter, but what browser are you using?
definitely understand not wanting to screw up your setup. anything you can do to help me replicate the problem is much appreciated. I've tested on two separate setups on both Firefox and Chrome and haven't been able to reproduce. :(
I am using chrome. I'll be carefull doing the next update. At least I now know how to restore the backup. Read the JSON and type it in manually. That is alricht if it does not happen to often. ;-)
I'm reasonably sure there's something kind of screwy about your ESP8266, although I'm not sure what. Have you used it for anything else before?
Would really like to reproduce the issue -- should never happen.
No nothing before. I have purchased it for this goal. Can it be that is a nodemcu V3 and the build is for a V2 ? Or maybe the Chinese send me a cheap clone .
v2 and v3 are both fine. it's possible, but the only way in which i think it'd matter is if you have less than 4MB of flash. You can inspect this with esptool.py flash_id
:
$ esptool.py flash_id
esptool.py v2.7
Found 2 serial ports
Serial port /dev/cu.usbserial-1443230
Connecting....
Detecting chip type... ESP8266
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: 2c:3a:e8:2e:fd:b3
Uploading stub...
Running stub...
Stub running...
Manufacturer: c8
Device: 4016
Detected flash size: 4MB
Hard resetting via RTS pin...
anyway --
If you're willing to try one more potentially destructive thing, here's a suggestion that might narrow it down:
esptool.py
. Here's a tutorial. Hopefully this would avoid you losing your settings again.esptool.py erase_flash
pio device monitor
to see if anything interesting shows upIf this works like I suspect it will, you'll have restored settings and no manual re-entry required. If it doesn't, you can restore your backed up flash image using the process mentioned in (1).
Do you have a windows 10 version of the eptool.py ?
It's just a python library, it'll work on windows. (just install with pip)
Not having exactly the same issue (restore works), but since 10.0.1 I'm having a problem that might be related so I'll chime in anyway.
Since 10.0.1 I notice that I am losing settings after being up and running a for a while (few hours/day). The first time I had to run around the house, re-pair all the bulbs, but luckily was smart enough to take a backup. In probably less than a day, settings were gone again. Restoring the backup luckily works. I've tried restoring a few times, but always ended up losing my settings after a while.
I've tried redownloading/reflashing the image a few times. Also used the esptool to erase_flash thinking something might have gotten left over, but nothing worked. In the end flashed back to 10.0.0-dev4 and all is working again. It's been up and running for around a week.
Loving the new MQTT Hass discovery and aliases btw, great addition! I will try and flash 10.0.2 to see if I am still having the same problem.
On a totally unrelated note.. Would it be possible to add something to prevent the hub from (re)-pairing a bulb when someone flips the power switch at the same time another light gets a message to turn on/off ? I've noticed after a while some bulbs seem to merge with others, so 1 lightId all of a sudden controls multiple lights. Usually happens with the bulbs which are used with a wall switch, that get tied to bulbs which are triggered by PIR switches.
That's bizarre. Is there any part of your setup that is automatically updating settings? Is it possible for you to capture some serial logs while running a problematic version? Might help reveal what's going on.
On a totally unrelated note.. Would it be possible to add something to prevent the hub from (re)-pairing a bulb when someone flips the power switch at the same time another light gets a message to turn on/off ? I've noticed after a while some bulbs seem to merge with others, so 1 lightId all of a sudden controls multiple lights. Usually happens with the bulbs which are used with a wall switch, that get tied to bulbs which are triggered by PIR switches.
It's silly, but bulb pair packets are the same as ON commands. So as long as something is sending ON commands at the same time that you're physically flipping power on, there's no way to prevent the bulbs from interpreting that as a pair command.
That's bizarre. Is there any part of your setup that is automatically updating settings? Is it possible for you to capture some serial logs while running a problematic version? Might help reveal what's going on.
So after about day on 10.0.2 the settings disappeared again.
Can I build my own version with the debug flags enabled, or do I need to hook it up to nodeMCU to the PC and monitor it that way (which would be a pain since it only happens after about a day).
In the meantime I'll have a look at "esptool.py flash_id" to see if there's something funny about this nodeMCU. Weird thing is, the old version works fine, so it must be due to something added between 10.0.0-dev4 and 10.0.1
About the settings, not sure what could be updating my settings? I'm using the homeassistant discovery and topic / state patterns. I also have aliasses set up for my devices. This is the same setup (and settings.json I restored) I've been using on the older build which works fine.
On a totally unrelated note.. Would it be possible to add something to prevent the hub from (re)-pairing a bulb when someone flips the power switch at the same time another light gets a message to turn on/off ? I've noticed after a while some bulbs seem to merge with others, so 1 lightId all of a sudden controls multiple lights. Usually happens with the bulbs which are used with a wall switch, that get tied to bulbs which are triggered by PIR switches.
It's silly, but bulb pair packets are the same as ON commands. So as long as something is sending ON commands at the same time that you're physically flipping power on, there's no way to prevent the bulbs from interpreting that as a pair command.
I was already afraid it'd be something like this. Gives me a bad excuse to go all out on the PIR sensors where I don't have them yet and add electroshock to the physical switches to discourage usage :)
Here's esptool.py flash_id output, nothing odd there I think
Found 1 serial ports
Serial port COM4
Connecting....
Detecting chip type... ESP8266
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: 84:f3:eb:1a:c7:31
Uploading stub...
Running stub...
Stub running...
Manufacturer: 68
Device: 4016
Detected flash size: 4MB
Hard resetting via RTS pin...
Thanks for all of that detail. Great -- serial logs with debug flags would be super useful. Appreciate the effort!
It happend tot me too. For the first time suddenly all settings are gone.
Strangest thing, not the same settings file is loaded in exactly the same way all messages look the same. But now it is actually working
There's definitely something screwy going on with SPIFFS. I'm starting to suspect something in the compile toolchain changed the ldscript without telling me. Could've been a platformio upgrade or something.
Doesn't really explain the behavior where it works for a while and then doesn't, though...
The version I built from source is still running after 2 days. Could it maybe be something with the binaries?
Yeah, potentially. Can you paste pio --version
?
The output of pio run -e nodemcuv2 --verbose
would be helpful as well.
Thanks for your help.
Never mind, settings were gone again this morning. I'll have to figure out first how to monitor serial from the pi though before I can get you any logs.
periodic output of /about might be interesting as well.
This is after restoring the backup, I'm guessing the free_heap is what you want me to keep an eye on?
{"firmware":"milight-hub","version":"1.10.3","ip_address":"192.168.2.51","reset_reason":"Exception","variant":"nodemcuv2","free_heap":16736,"arduino_version":"2_4_2","queue_stats":{"length":0,"dropped_packets":0}}
Currently rolling back to older versions to try and figure out on which release exactly the issue got introduced.
Running now on 1.10.0-dev.7
yeah, exactly. I think my idle free heap is ~17 KB, so yours looks fine there. Possible you're using some feature that I don't normally use that leaks memory.
output of /settings
would be helpful too (make sure to sensor passwords)
this is meaningful as well:
"reset_reason":"Exception"
Oops, submitted that earlier than I meant to. {"reset_reason":"Exception"}
means something is crashing, so Serial output would likely be interesting. Can't think of a reason that would clear settings, though.
Hooked up the nodemcu with latest version to a pi and its logging to file with all the debug flags enabled. If it fails again hopefully I can catch the log.
Excellent. Thanks for your help.
Here's my settings btw
{"admin_username":"","admin_password":"","ce_pin":4,"csn_pin":15,"reset_pin":0,"led_pin":-2,"radio_interface_type":"nRF24","packet_repeats":50,"http_repeat_factor":1,"auto_restart_period":0,"mqtt_server":"192.168.2.52","mqtt_username":"","mqtt_password":"","mqtt_topic_pattern":"milight/commands/:device_id/:device_type/:group_id","mqtt_update_topic_pattern":"","mqtt_state_topic_pattern":"milight/state/:device_id/:device_type/:group_id","mqtt_client_status_topic":"milight/client_status","simple_mqtt_client_status":true,"discovery_port":48899,"listen_repeats":0,"state_flush_interval":10000,"mqtt_state_rate_limit":900,"packet_repeat_throttle_sensitivity":0,"packet_repeat_throttle_threshold":200,"packet_repeat_minimum":3,"enable_automatic_mode_switching":false,"led_mode_wifi_config":"Off","led_mode_wifi_failed":"Off","led_mode_operating":"Off","led_mode_packet":"Off","led_mode_packet_count":3,"hostname":"milight-hub","rf24_power_level":"MAX","rf24_listen_channel":"LOW","wifi_static_ip":"","wifi_static_ip_gateway":"","wifi_static_ip_netmask":"","packet_repeats_per_loop":10,"home_assistant_discovery_prefix":"homeassistant","wifi_mode":"n","default_transition_period":500,"rf24_channels":["LOW","MID","HIGH"],"device_ids":[1,2,3,4,5,6,7,8],"gateway_configs":[],"group_state_fields":["state","brightness","mode","color_temp","bulb_mode","computed_color","effect"],"group_id_aliases":{"bathroom_milight_fut105":["rgb_cct",3,1],"bedroom_milight_fut105":["rgb_cct",4,1],"deathstar_milight_fut016":["rgbw",5,1],"dinner_table_milight":["rgb_cct",2,1],"kitchen2":["cct",7,1],"kitchen3":["cct",8,1],"kitchen_milight_mr16_1":["cct",6,1],"peoplecave_milight_fut105":["rgb_cct",1,1]}}
Just checked the /about again, free heap is a lot lower and there's a new exception type.
{"firmware":"milight-hub","version":"unknown","ip_address":"192.168.2.51","reset_reason":"Software Watchdog","variant":"nodemcuv2","free_heap":7968,"arduino_version":"2_4_2","queue_stats":{"length":0,"dropped_packets":0}}
definitely looks like a memory leak of some kind. Did you happen to catch serial logs? WDT reset is watchdog timer reset. It means some blocking code runs for too long and the system gets hard reset. With this software, it's usually the nRF library busy waiting for an SPI response, usually due to spotty wiring connections. But it's possible it's memory related.
If you're able to find some set of actions that trigger the memory to go down consistently, that would be most helpful.
I was pretty lazy with the queued packet code. It creates a lot of objects, but they're shared_ptrs, so should be getting cleaned up.
Seems like my log file is being overwritten ever 30 or so minutes, but I think I managed to catch a crash. Settings are still there though.
thank you!
Looks like the WDT reset is just because the Serial debug logs for the discovery client take too long to print. Just changed the baudrate from the default (9600) to 115200 and that went away. Could you do that as well? Look for Serial.begin(9600)
in main.cpp
. You'll need to start serial logs with pio device monitor -b 115200
as well.
Was initially suspecting that the leak was in the discovery client, but doesn't seem to be the case. Set this up:
``` $ for i in $(seq 1 100); do docker start mosquitto>/dev/null; curl -s 10.133.8.194/about | jq '.free_heap'; sleep 10; docker stop mosquitto>/dev/null; done 16248 16168 16432 16592 16424 15504 16328 16328 16248 15920 16328 16424 16424 16424 16096 16248 16248 15328 16328 16328 16480 16096 15176 16248 15936 16264 16328 16424 16424 16480 15768 15408 16328 16328 16384 16328 14736 16096 16248 16304 16248 15408 16328 16000 16368 16328 16264 16320 16248 15408 16328 16328 16384 16328 15408 16328 16248 16248 16344 16328 16096 15768 16328 15504 16424 16424 15824 16328 14736 16328 16264 16304 16328 15504 16368 16368 16424 16328 15176 16248 16120 16488 16328 15504 16424 16096 16032 16328 15352 16272 16272 ```
There's no downwards trend. There are a few times when it dips into the 14Ks, but it pops back up. So it's at least not this alone.
You might try tossing tee
on your command to log to a file if your scrollback buffer is eating logs. I'm using:
pio device monitor -b 115200 | tee -a pio_log.txt
something interesting from your logs -- it looks like you've got some kind of scheduled command being sent roughly every minute to 0x1,rgb_cct,1. is this expected?
Added a heap monitor to HomeAssistant:
- name: "ESP8266 Milight Hub - Free Heap"
platform: rest
resource: http://10.133.8.158/about
value_template: '{{ value_json.free_heap }}'
unit_of_measurement: 'Bytes'
I was using a python script to read the serial output, after I changed to baud 115200 I started getting the folowing error:
SerialException: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
I noticed the same error when trying to read the device with
pio device monitor -b 115200 | tee -a pio_log.txt
Worked my way around it with a try / catch and it's working, but the output is unreadable at times.
09/29/2019 23:22:35 : {0
09/29/2019 23:22:36 w d09/29/2019 23:23:06 MqttClient - Got message on topic: milight/commands/0x3/rgb_cct/1
09/29/2019 23:23:06 MqttClient - device 0003, group 1
09/29/2019 23:23:06 Enqueuing packet
09/29/2019 23:23:06 Enqueuing packet
09/29/2019 23:23:06 Switching to next packet, 2 (10 repeats):
09/29/2019 23:23:06 00 DB E1 22 63 F6repeats):
09/29/2019 23:23:06 00 DB66 BF
09/29/2019 23:23:06 repeats):
09/29/2019 23:23:06
09/29/2019 23:23:06 re66Elre66Elre66Elre66Elre66Eltaou{}Pa{}
09/29/2019 23:23:06 : {0
09/29/2019 23:23:07 w 09/29/2019 23:23:37 c: 1
09/29/2019 23:23:37 moa:is":80ti"c00. 00"bawt:00e
09/29/2019 23:23:37 Pa{}
09/29/2019 23:23:37 : {0
09/29/2019 23:24:08 1
09/29/2019 23:24:08 00e
09/29/2019 23:24:08 66Elre66Elre66Elre66Elre66Eltate wiou{}Pa{}
09/29/2019 23:24:08 0
09/29/2019 23:24:09 at0000w 09/29/2019 23:24:39 MqttCc:1
09/29/2019 23:24:39 66Elrep66Elare6Elapsed: 50
09/29/2019 23:24:39 Sending packet (10 repeats):
09/29/2019 23:24:39 is":80ti"c00. 000"bawt:00e
09/29/2019 23:24:39 Elre66Elre66Elare66Eltaou{}Pat{}
09/29/2019 23:24:39 : {0
09/29/2019 23:24:40 at0000w d09/29/2019 23:25:10 1
09/29/2019 23:25:10 awt:00e
09/29/2019 23:25:10 Elataou{}Pa{}
09/29/2019 23:25:10 0
09/29/2019 23:25:11 w
Probably some settings in my script, but I didn't find a magic combination yet
ser = serial.Serial(
port='/dev/ttyUSB0',
baudrate = 115200,
parity=serial.PARITY_NONE,
stopbits=serial.STOPBITS_ONE,
bytesize=serial.EIGHTBITS,
timeout=1
)
counter=0
The heap monitor is great idea, I'll keep an eye on the graph and hope I can read the log around those times.
You might try passing -p /dev/ttyUSB0
. I'm guessing that PlatformIO is using the same Serial library under the hood considering it's Python, so it should, in theory, work :)
Here's the full command I'm using:
$ pio device monitor -b 115200 | ts '[%Y-%m-%d %H:%M:%S]' | tee -a /tmp/pio_logs.txt
--- Miniterm on /dev/cu.usbserial-1443230 115200,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
[2019-09-29 16:45:54] MqttClient - connecting
[2019-09-29 16:45:54] ERROR: Failed to connect to MQTT server
[2019-09-29 16:45:59] MqttClient - connecting
[2019-09-29 16:45:59] ERROR: Failed to connect to MQTT server
ts
is from moreutils, in case it's helpful.
thanks again for your help!
This is what I mean, notice the missing characters and finally just crashing altogether.
dietpi@DietPi:~$ pio device monitor -b 115200 | ts '[%Y-%m-%d %H:%M:%S]' | tee -a /tmp/pio_logs.txt
--- Miniterm on /dev/ttyUSB0 115200,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
[2019-09-30 08:07:39] qc1
[2019-09-30 08:07:39] MqttClient - device 0003, gup
[2019-09-30 08:07:39] Fhing dirty state for 0x0003 ExeDEreats):
[2019-09-30 08:07:39] 00 DB E1 22 63 E1 5D 6lrts):
--- exit ---
Exception in thread rx:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/serial/tools/miniterm.py", line 445, in reader
data = self.serial.read(self.serial.in_waiting or 1)
File "/usr/local/lib/python2.7/dist-packages/serial/serialposix.py", line 501, in read
'device reports readiness to read but returned no data '
SerialException: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
Update: I'm pretty sure I tried this before, but I killed the other logging process (as the error informs me to) and now it seems to be working. Still having the occasional unreadable line, but it looks a lot better. I'll keep you posted.
something interesting from your logs -- it looks like you've got some kind of scheduled command being sent roughly every minute to 0x1,rgb_cct,1. is this expected?
That would probably be flux https://www.home-assistant.io/components/flux/
Ah, got it. Yeah, Serial only works with one connection.
Ok, cool. Thanks again.
It crashed again shortly after, seems like the connection isn't really stable. Trying to make something in bash with your command to keep retrying.
Think I have something working by just putting it in a while loop, still strange it's disconnecting now. Maybe the power from the raspberry pi isn't sufficient?
Here's how the heap size graph is looking currently. Stable when nothing is happening, but a a huge dip when I started playing around with the lights / serial log.
After updating to 1.10.2 with web UI all settings are gone. And uploading the backuped settings give a succes message, but there are still no settings
Steps to reproduce
Started with version 1.10.1 In web UI did "check for updates" Clicked download firmware. Updated through webUI Settings are gone. Through web UI tried Restore settings without result.
Expected behavior
After an update setrtings should automatically be restored. After a restore the stored settings should be rerstored.
Firmware version
Stared with 1.10.1 now 1.10.2