Open jmozmoz opened 6 years ago
Please see #458
I don't believe TLS has been pushed to here yet, it's only in my fork branch for v2.1
I changed platformio.ini to use your v2.1 branch:
build_flags = -DASYNC_TCP_SSL_ENABLED=1 -Wl,-Teagle.flash.1m64.ld
lib_deps =
https://github.com/timpur/homie-esp8266.git#v2.1
simpleTimer
jled
But now my config.json seems not to work anymore. Instead, the ESP going into setup mode:
Triggering Wi-Fi scan...
β Wi-Fi scan completed
Triggering Wi-Fi scan...
β Wi-Fi scan completed
Do I have to change something in the config.json?
{
"name": "YAGDS",
"device_id": "door_sensor_test",
"device_stats_interval": 60,
"wifi": {
"ssid": "xxx",
"password": "yyy"
},
"mqtt": {
"host": "aaa",
"port": 8883,
"base_topic": "homie/",
"auth": true,
"username": "bbb",
"password": "ccc",
"ssl": true,
"ssl_fingerprint": "d62718dd09ce3d801f597a2b297893e80b828a5d"
},
"ota": {
"enabled": true
},
"settings": {
"sensor_update_interval": 2,
"max_resend": 5,
"connection_timeout": 30,
"sleep_timeout": 10
}
}
Try 128kb spifis
Also what does the esp tell you the error is with the config at boot ? It should spit something out about the config error ?
I change the SPIFFS size to 128 kB, but it still does not work.
build_flags = -DASYNC_TCP_SSL_ENABLED=1 -Wl,-Teagle.flash.1m128.ld
lib_deps =
https://github.com/timpur/homie-esp8266.git#v2.1
Here is the output (of the bare minimum example):
β» Config file loading
β Config file faild to load
β Config file too big
Configuration invalid. Using CONFIG MODE
π‘ Firmware bare-minimum (1.0.5)
π Booting into config mode π
Device ID is dc4f2211e3dc
AP started as Homie-dc4f2211e3dc with IP 192.168.123.1
Homie button is enabled
Triggering Wi-Fi scan...
β Wi-Fi scan completed
Ok, now things get strange: I went back to the default SPIFFS size. Now, the output is:
β Config file faild to load
β Config file too big
Configuration invalid. Using CONFIG MODE
π‘ Firmware bare-minimum (1.0.5)
π Booting into config mode π
Device ID is dc4f2211e3dc
AP started as Homie-dc4f2211e3dc with IP 192.168.123.1
Homie button is enabled
Triggering Wi-Fi scan...
β Wi-Fi scan completed
If I remove the line with "ssl" from the config.json, the configuration seems to be ok.
But both
"ssl": true,
and
"ssl": false,
result in the error message above.
I found the problem with the config: In this branch, the config limit is still 500 characters, which was just passed by adding the line above: https://github.com/timpur/homie-esp8266/blob/v2.1/src/Homie/Limits.hpp#L6
const uint16_t MAX_JSON_CONFIG_FILE_SIZE = 500;
In the original respository, it is 1000.
This is the commit, when it was changed: https://github.com/timpur/homie-esp8266/commit/37f5b62851a90d356a60eef7887410bf7dd5d9fe#diff-5f8411c937cdd2d3e98c0cde5993ab8bL6
Ok, now that SSL is working using the config (alone), the orignal problem still persists: The initiating of an OTA just results in dis- and re-connect of the MQTT broker, but no output about OTA.
Setting "ssl": false in config.json results in successful OTA.
I tried to look into this further and added some debug output:
void BootNormal::_onMqttMessage(char* topic, char* payload, AsyncMqttClientMessageProperties properties, size_t len, size_t index, size_t total) {
if (total == 0) return; // no empty message possible
Interface::get().getLogger() << F("topic: ") << topic << endl;
bool BootNormal::__handleOTAUpdates(char* topic, char* payload, const AsyncMqttClientMessageProperties& properties, size_t len, size_t index, size_t total) {
Interface::get().getLogger() << _mqttTopicLevelsCount << endl;
Interface::get().getLogger() << topic << endl;
Interface::get().getLogger() << _mqttTopicLevels.get()[0] << endl;
Interface::get().getLogger() << _mqttTopicLevels.get()[1] << endl;
Working OTA over non-SSL:
topic: homie/bare/$implementation/ota/firmware/a915eabf0757c1ca6d526bb5c2b01641
5
homie/bare
bare
$implementation
test2
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
test6
Firmware is binary
Receiving OTA firmware (1382/476112)...
topic: homie/bare
5
homie/bare
bare
$implementation
test2
Receiving OTA firmware (2842/476112)...
topic: homie/bare
Non working update over SSL:
Triggering WIFI_CONNECTED event...
β Attempting to connect to MQTT...
Sending initial information...
topic: homie/bare/$implementation/reset
3
homie/bare
bare
$implementation
topic: homie/bare/$implementation/restart
3
homie/bare
bare
$implementation
β MQTT ready
Triggering MQTT_READY event...
Calling setup function...
γ½ Sending statistics...
β’ Wi-Fi signal quality: 90%
β’ Uptime: 6s
β MQTT disconnected
Triggering MQTT_DISCONNECTED event...
β Attempting to connect to MQTT...
Sending initial information...
topic: homie/bare/$implementation/reset
3
homie/bare
bare
$implementation
topic: homie/bare/$implementation/restart
3
homie/bare
bare
$implementation
β MQTT ready
Triggering MQTT_READY event...
γ½ Sending statistics...
β’ Wi-Fi signal quality: 86%
β’ Uptime: 45s
So the OTA message is lost before _onMqttMessage()
Okay seems I'll need to do some more testing ssl before releasing :P thanks for debugging
Is it possible, that the OTA message is lost by the MQTTClient library, if it is tranfered via SSL, e.g. due to some size limits?
I can see, that the message is sent out by the broker (using mosquitto_sub). But it looks like, it is not received by the MQTT client on the ESP-01.
I put a debug output in AsyncMqttClient::_onData
and it looks like that there is also no message received there.
But if I send a shorter message via mosquitto_pub, it is received by the ESP-01:
mosquitto_pub -h aaa -p 1883 -u yyy -P 'xxxx' -t 'homie/bare/$implementation/ota/firmware/test' -m test
testB
topic: homie/bare/$implementation/ota/firmware/test
5
homie/bare
bare
$implementation
test2
Receiving OTA payload
β OTA failed (400 BAD_CHECKSUM)
Triggering OTA_FAILED event...
β Aborting, invalid MD5
I can even see this, when using mosquitto_pub.
Without SSL, this works (the ESP-01 receives messages):
dd if=/dev/zero of=test_load.txt count=70000 bs=1
mosquitto_pub -h aaa -p 1883 -u xxx -P zzz -t 'homie/bare/$implementation/ota/firmware/test' -f test_load.txt
With SSL, the MQTT connection is lost and re-established as above.
The message limit seems to bee somewhere between 14000 and 15000 bytes.
The processing method is different for config JSON and OTA, one needs to be entirely loaded into memory before processed and the other, OTA can be processed packet at a time.
Thus there is a limit on for config and other stuff but shouldn't be one for OTA.
The processing method is different for config JSON and OTA, one needs to be entirely loaded into memory before processed and the other, OTA can be processed packet at a time.
Thus there is a limit on for config and other stuff but shouldn't be one for OTA.
Sorry, but I do not understand this: At the moment it looks like that message sent by https://github.com/marvinroger/homie-esp8266/blob/develop/scripts/ota_updater/ota_updater.py is too big to be received by the ESP-01, if sent over SSL/TLS. So does this mean, the script has to be changed to sent the OTA in smaller chunks? As written above, the message is lost somewhere in the MQTT library (before it can be processed by the OTA handler).
If i get it right, you have to publish your update to
homie/bare/$implementation/ota/firmware/md5sum file.bin
on an sonoff basic I get the following error
Receiving OTA payload β OTA started Triggering OTA_STARTED event... Firmware is binary β OTA failed (400 NOT_ENOUGH_SPACE) Triggering OTA_FAILED event...
What I remember I get ride of this issue by tweaking the device partition. Will provide a update next week.
It should be possible to do ota on 1 MB how big is your sketch when built ?
Please also see http://marvinroger.github.io/homie-esp8266/docs/2.0.0/others/ota-configuration-updates/
470400B .pioenvs/sonoff_basic/firmware.bin
but in the default settings it uses the eagle.flash.1m256.ld
so its a bit to big. changed to eagle.flash.1m64.ld
Then the space is enought.
Then it crashes for a random reason.
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16296/470400)...
Exception (3):
epc1=0x401003e9 epc2=0x00000000 epc3=0x00000000 excvaddr=0x40050669 depc=0x00000000
ctx: sys
sp: 3ffffc60 end: 3fffffb0 offset: 01a0
>>>stack>>>
3ffffe00: 00000000 3ffe8b9a 00000004 40100690
3ffffe10: 4024d435 00000003 000016d0 fffffffc
3ffffe20: 00000046 00000014 3fff40c4 401004d8
3ffffe30: 3fff4074 3fff40a4 3fff3ebc 40106dc8
3ffffe40: 00000000 600011f0 3fff40c4 402550dc
3ffffe50: 00000000 3fff40a4 3fff3dd4 4020732d
3ffffe60: 3fff1c6c 00000370 00000370 3fff2258
3ffffe70: 00000000 8e3e0000 3fff40c4 4025623d
3ffffe80: 3fff0574 0000418a 3fff40c4 4025674b
3ffffe90: 3fff40a4 3fff40c4 3fff0574 3fff2258
3ffffea0: 00000000 0000418a 3fff40c4 4025683a
3ffffeb0: 0000418a 3fff2418 00000006 3fff2258
3ffffec0: 3fff40c4 3fff2260 3fff225c 3fff2258
3ffffed0: 3fff40c4 3fff2260 3fff225c 4025b929
3ffffee0: 000005b4 ad6b1fac 00000010 000005b4
3ffffef0: 00000000 00000010 3fff0a20 40106dc8
3fffff00: 40250000 00000000 0000007d 3fff2410
3fffff10: 3ffeb69a 3fff2418 3fff40a4 40258b81
3fffff20: 3fff2178 3fff29c4 3fff29c4 3ffef3c0
3fffff30: 00000000 3fff40a4 0000001c 3fff29c4
3fffff40: 3ffeb68c 00000000 3fff40a4 40257f81
3fffff50: 2a6b1fac 00000612 00000000 0000002e
3fffff60: 00000002 0000001a 402468df 3ffee1e0
3fffff70: 3ffeb664 3fffdcc0 3ffea7e0 3ffea7e0
3fffff80: 40246852 3ffee1e0 00000000 3fff2a2c
3fffff90: 3fffdc80 00000000 3fff40a4 4024cdcf
3fffffa0: 40000f49 3fffdab0 3fffdab0 40000f49
<<<stack<<<
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16296/470400)...
Receiving OTA firmware (32680/470400)...
Receiving OTA firmware (49064/470400)...
Receiving OTA firmware (65448/470400)...
Receiving OTA firmware (81832/470400)...
Receiving OTA firmware (98216/470400)...
Receiving OTA firmware (114600/470400)...
Receiving OTA firmware (130984/470400)...
Receiving OTA firmware (147368/470400)...
Receiving OTA firmware (163752/470400)...
Receiving OTA firmware (180136/470400)...
Receiving OTA firmware (196520/470400)...
Receiving OTA firmware (212904/470400)...
Receiving OTA firmware (229288/470400)...
β MQTT disconnected
Triggering MQTT_DISCONNECTED event...
β Attempting to connect to MQTT...
Sending initial information...
β MQTT ready
Can you decode one of the stack traces ? https://github.com/me-no-dev/EspExceptionDecoder if using pio use the cmd version (at bottom of page)
In my tests, there is no stack trace. The firmware continuous to run (see comment 3 above).
It looks like the message is drop somewhere AsynMqttClient.
thanks for the links!
Was a bit busy here ist the stack trace from the exception above:
0x40100690: free at ??:?
0x4024d435: system_pp_recycle_rx_pkt at ??:?
0x401004d8: malloc at ??:?
0x40106dc8: pvPortMalloc at ??:?
0x402550dc: pbuf_alloc at core/pbuf.c:366
0x4020732d: tcp_ssl_read at ??:?
0x4025623d: tcp_output_alloc_header at core/tcp_out.c:97
0x4025674b: tcp_send_empty_ack at core/tcp_out.c:842
0x4025683a: tcp_output at core/tcp_out.c:925
0x4025b929: tcp_input at core/tcp_in.c:423
0x40106dc8: pvPortMalloc at ??:?
0x40250000: wifi_softap_set_default_ssid at ??:?
0x40258b81: ip_input at core/ipv4/ip.c:558
0x40257f81: ethernet_input at netif/etharp.c:1377
0x402468df: pp_tx_idle_timeout at ??:?
0x40246852: pp_tx_idle_timeout at ??:?
0x4024cdcf: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
Will post more stack trace soon, because I don't think it is always the same error.
Firmware is binary
Receiving OTA firmware (16300/470400)...
Receiving OTA firmware (32684/470400)...
Receiving OTA firmware (49068/470400)...
Receiving OTA firmware (65452/470400)...
Receiving OTA firmware (81836/470400)...
Receiving OTA firmware (98220/470400)...
Receiving OTA firmware (114604/470400)...
Receiving OTA firmware (130988/470400)...
Receiving OTA firmware (147372/470400)...
Receiving OTA firmware (163756/470400)...
Exception (3):
epc1=0x401003e9 epc2=0x00000000 epc3=0x00000000 excvaddr=0x40045a09 depc=0x00000000
ctx: sys
sp: 3ffffc60 end: 3fffffb0 offset: 01a0
>>>stack>>>
0x40100690: free at ??:?
0x4024d435: system_pp_recycle_rx_pkt at ??:?
0x401004d8: malloc at ??:?
0x40106dc8: pvPortMalloc at ??:?
0x402550dc: pbuf_alloc at core/pbuf.c:366
0x4020732d: tcp_ssl_read at ??:?
0x4025623d: tcp_output_alloc_header at core/tcp_out.c:97
0x4025674b: tcp_send_empty_ack at core/tcp_out.c:842
0x4025683a: tcp_output at core/tcp_out.c:925
0x4025b929: tcp_input at core/tcp_in.c:423
0x40106dc8: pvPortMalloc at ??:?
0x40250000: wifi_softap_set_default_ssid at ??:?
0x40258b81: ip_input at core/ipv4/ip.c:558
0x40257f81: ethernet_input at netif/etharp.c:1377
0x402468df: pp_tx_idle_timeout at ??:?
0x40246852: pp_tx_idle_timeout at ??:?
0x4024cdcf: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
<<<stack<<<
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16300/470400)...
β MQTT disconnected
Triggering MQTT_DISCONNECTED event...
β Attempting to connect to MQTT...
Exception (0):
epc1=0x401f8216 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000
ctx: sys
sp: 3ffffdd0 end: 3fffffb0 offset: 01a0
>>>stack>>>
Exception Cause: 0 [Illegal instruction]
0x401f8216: ?? ??:0
0x4024d191: ets_timer_handler_isr at ??:?
0x4024d1d6: ets_timer_handler_isr at ??:?
0x4024cdcf: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
<<<stack<<<
This is the source/binary from all above stacktraces from me.
source: https://git.o-g.at/esp8266/homie enviroment: sonoff_basic
(old self patched tls. but very simple and basic patch ) Homie Source: https://github.com/TuxCoder/homie-esp8266
@jmozmoz in your last try, you have used the false topic, see first post from me. That is the reason for false md5sum.
@TuxCoder do you mean this comment: https://github.com/marvinroger/homie-esp8266/issues/532#issuecomment-402569306
This was just a test to check if the size of a message was a problem. For the firmware updates, I use the ota_updater.py from https://github.com/marvinroger/homie-esp8266/tree/develop/scripts/ota_updater
c:\Users\joachim\workspace\tor\ota>python ota_updater_github.py -l alpha -p 8883 -u aaa -d "bbb" -i bare c:\Users\joachim\Documents\PlatformIO\Projects\test_esp01\.pioenvs\esp01_1m\firmware.bin --broker-tls-cacert ca.crt
Connecting to mqtt broker alpha on port 8883
Connected with result code 0
Waiting for device to come online...
Waiting for device info...
Publishing new firmware with checksum 9b9ae9c94712d5281d7ab640c5511226
Expecting checksum 9b9ae9c94712d5281d7ab640c5511226, got 5f857d1629bf85e8a15650d294bca80b, update failed!
hmm, from the uart debug messages, It looks like your devices haven't received any update message.
I testest my stuff with
mosquitto_pub --capath /etc/ssl/certs -h .... -p 8883 -u switch6 -P 123456 -t "/devices/switch6/\$implementation/ota/firmware/b8a64dbf82e16d1e0cfb0754a4ce4fce" -f .pioenvs/sonoff_basic/firmware.bin
tried the branch from timpur with ssl support. (v2.1) result:
Sometimes same error as jmozmoz, so no log entry about OTA, but a mqtt disconnect.
And this stacktrace, after a few tries.
β» Config file loading
β Config file loaded and validated
π‘ Firmware sonoff (1.0.0)
π Booting into normal mode π
π SSL is: enabled
Homie button is enabled
{} Stored configuration
β’ Hardware Device ID: 6001943ec3c1
β’ Device ID: switch5
β’ Name: switch5
β’ Device Stats Interval: 60 sec
β’ Wi-Fi:
β¦ SSID: xxxxxx
β¦ Password not shown
β’ MQTT:
β¦ Host: xxxxxx
β¦ Port: 8883
β¦ SSL Enabled?: yes
β¦ Base Topic: /devices/
β¦ Auth? yes
β¦ Username: switch5
β¦ Password not shown
β’ OTA:
β¦ Enabled? yes
β Attempting to connect to Wi-Fi...
β Wi-Fi connected, IP: xxxxxxxxx
Triggering WIFI_CONNECTED event...
β Attempting to connect to MQTT...
Sending initial information...
β MQTT ready
Triggering MQTT_READY event...
Calling setup function...
γ½ Sending statistics...
β’ Wi-Fi signal quality: 80%
β’ Uptime: 6s
β MQTT disconnected
Triggering MQTT_DISCONNECTED event...
β Attempting to connect to MQTT...
Sending initial information...
β MQTT ready
Triggering MQTT_READY event...
γ½ Sending statistics...
β’ Wi-Fi signal quality: 74%
β’ Uptime: 7s
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16300/477088)...
Receiving OTA firmware (32684/477088)...
Receiving OTA firmware (49068/477088)...
Receiving OTA firmware (65452/477088)...
Receiving OTA firmware (81836/477088)...
Receiving OTA firmware (98220/477088)...
Receiving OTA firmware (114604/477088)...
Receiving OTA firmware (130988/477088)...
Receiving OTA firmware (147372/477088)...
Receiving OTA firmware (163756/477088)...
Receiving OTA firmware (180140/477088)...
Receiving OTA firmware (196524/477088)...
Receiving OTA firmware (212908/477088)...
Receiving OTA firmware (229292/477088)...
Receiving OTA firmware (245676/477088)...
Receiving OTA firmware (262060/477088)...
Exception (3):
epc1=0x401000b7 epc2=0x00000000 epc3=0x00000000 excvaddr=0x4005b26c depc=0x00000000
ctx: sys
sp: 3ffffbd0 end: 3fffffb0 offset: 01a0
Exception Cause: 3 [LoadStoreError: Processor internal physical address or data error during load or store]
0x401000b7: umm_disconnect_from_free_list at umm_malloc.c:?
0x4005b26c: ?? ??:0
0x40256a06: tcp_output at core/tcp_out.c:990
0x4010020c: _umm_free at umm_malloc.c:?
0x40100690: free at ??:?
0x402074d6: ax_port_read at ??:?
0x4025956c: espconn_sent at app/espconn.c:387
0x4022dac3: x509_new at ssl/x509.c:121 (discriminator 1)
0x40103205: lmacProcessTXStartData at ??:?
0x40100eb6: pp_post at ??:?
0x401042bb: lmacRxDone at ??:?
0x40100eb6: pp_post at ??:?
0x4022df98: pgm_read_byte at ssl/os_port.h:107
(inlined by) memcpy_P at ssl/os_port.h:134
(inlined by) x509_print at ssl/x509.c:647
0x401021e6: wDev_ProcessFiq at ??:?
0x402072b5: tcp_ssl_read at ??:?
0x4010020c: _umm_free at umm_malloc.c:?
0x40100690: free at ??:?
0x40101f80: wDev_ProcessFiq at ??:?
0x40221fdb: AsyncWebServerRequest::_handleUploadByte(unsigned char, bool) at ??:?
0x40210000: HomieInternals::ExponentialBackoffTimer::check() at ??:?
0x4025c330: ram_ana_inf_gating_en at ??:?
0x40222004: AsyncWebServerRequest::_handleUploadByte(unsigned char, bool) at ??:?
0x4025cfac: test_tout at ??:?
0x40106e28: xPortWantedSizeAlign at ??:?
0x40220000: StreamString::read() at ??:?
0x4025a235: espconn_udp_sent at app/espconn_udp.c:141
0x40259635: espconn_recv at app/espconn.c:447
0x40247f8b: wDev_Enable_Beacon_Tsf at ??:?
0x40247efe: wDev_Option_Init at ??:?
0x4024e483: system_relative_time at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
One time a freeze, followed by a reboot
β MQTT ready
Triggering MQTT_READY event...
γ½ Sending statistics...
β’ Wi-Fi signal quality: 78%
β’ Uptime: 12s
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16300/477088)...
Receiving OTA firmware (32684/477088)...
ets Jan 8 2013,rst cause:4, boot mode:(3,7)
I tried using mosquitto_pub but the result is the same.
β MQTT disconnected
Triggering MQTT_DISCONNECTED event...
β Attempting to connect to MQTT...
Sending initial information...
β MQTT ready
Triggering MQTT_READY event...
γ½ Sending statistics...
β’ Wi-Fi signal quality: 90%
β’ Uptime: 617s
If I split the firmware to 10000 byte sized junks, the OTA process starts, but of course stops due to a wrong MD5SUM.
split -b 10000 firmware.bin
mosquitto_pub -h alpha -p 8883 --cafile /etc/mosquitto/ca_certificates/ca.crt -u aaa -P bbb -t 'homie/bare/$implementation/ota/firmware/9b9ae9c94712d5281d7ab640c5511226' -f xaa
Receiving OTA payload
β OTA started
Triggering OTA_STARTED event...
β OTA failed (400 BAD_FIRMWARE)
Triggering OTA_FAILED event...
So I still think, the a too big message is droped somewhere in AsyncMqttClient.
It is very interesting, that if I try it multiple times, sometimes the OTA starts but gets hang up later with a freez or stacktrace.
Will try to setup gdb, but need a bit time for that. Going to write if I have some results.
Perhaps this is a similar issue: https://github.com/esp8266/Arduino/pull/4980
In https://github.com/marvinroger/homie-esp8266/issues/530#issuecomment-424521765 the LWIP options are mention. So I tried if one of them helps:
@jmozmoz for PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY
that sad :(
Hopefully soon ill find time to have a look
If TLS/SSL is enabled by the following entries in main.cpp, config.json and platformio.ini, OTA is not working. If SSL is disable, OTA is working.
main.cpp
config.json
platformio.ini
Disabling SSL is done by changing the port to 1883 and removing the -DASYNC_TCP_SSL_ENABLED from build_flags
The output in the serial console, if SSL is enabled and an OTA is send to the device is just:
and the ota_updater.py just gets back the original checksum