homieiot / homie-esp8266

πŸ’‘ ESP8266 framework for Homie, a lightweight MQTT convention for the IoT
http://homieiot.github.io/homie-esp8266
MIT License
1.36k stars 308 forks source link

OTA does not work if TLS/SSL is enable #532

Open jmozmoz opened 6 years ago

jmozmoz commented 6 years ago

If TLS/SSL is enabled by the following entries in main.cpp, config.json and platformio.ini, OTA is not working. If SSL is disable, OTA is working.

main.cpp

#if ASYNC_TCP_SSL_ENABLED
  Homie.getLogger() << endl << endl << F("ESP Homie Using TLS") << endl;
  Homie.getMqttClient().setSecure(true);
  Homie.getMqttClient().addServerFingerprint((const uint8_t[])MQTT_SERVER_FINGERPRINT);
#endif

config.json

  "mqtt": {
    "host": "xxx", 
    "port": 8883,
    "base_topic": "homie/",
    "auth": true,
    "username": "aaa",
    "password": "bbb",
    "ssl": true,
    "ssl_fingerprint":"abcdef12345"
  },

platformio.ini

[env:esp01_1m]
platform = espressif8266
board = esp01_1m
framework = arduino
;build_flags = -Wl,-Teagle.flash.1m64.ld
build_flags = -DASYNC_TCP_SSL_ENABLED=1 -Wl,-Teagle.flash.1m64.ld
lib_deps =
  https://github.com/marvinroger/homie-esp8266.git#v2.1
  simpleTimer
  jled

; Serial Monitor options
monitor_speed = 115200

Disabling SSL is done by changing the port to 1883 and removing the -DASYNC_TCP_SSL_ENABLED from build_flags

The output in the serial console, if SSL is enabled and an OTA is send to the device is just:

βœ– MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ” MQTT ready
Triggering MQTT_READY event...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 96%
  β€’ Uptime: 114s

and the ota_updater.py just gets back the original checksum

Connecting to mqtt broker alpha on port 1883
Connected with result code 0
Waiting for device info...
Publishing new firmware with checksum 6a0c1c4c31e15b2c7cd6970ffdea8cb3
Expecting checksum 6a0c1c4c31e15b2c7cd6970ffdea8cb3, got 5a073a977e0536f7651faa0c8d77ca39, update failed!
timpur commented 6 years ago

Please see #458

I don't believe TLS has been pushed to here yet, it's only in my fork branch for v2.1

jmozmoz commented 6 years ago

I changed platformio.ini to use your v2.1 branch:

build_flags = -DASYNC_TCP_SSL_ENABLED=1 -Wl,-Teagle.flash.1m64.ld
lib_deps =
  https://github.com/timpur/homie-esp8266.git#v2.1
  simpleTimer
  jled

But now my config.json seems not to work anymore. Instead, the ESP going into setup mode:

Triggering Wi-Fi scan...
βœ” Wi-Fi scan completed
Triggering Wi-Fi scan...
βœ” Wi-Fi scan completed

Do I have to change something in the config.json?

{
  "name": "YAGDS",
  "device_id": "door_sensor_test",
  "device_stats_interval": 60,
  "wifi": {
    "ssid": "xxx",
    "password": "yyy"
  },
  "mqtt": {
    "host": "aaa", 
    "port": 8883,
    "base_topic": "homie/",
    "auth": true,
    "username": "bbb",
    "password": "ccc",
    "ssl": true,
    "ssl_fingerprint": "d62718dd09ce3d801f597a2b297893e80b828a5d"
  },
  "ota": {
    "enabled": true
  },
  "settings": {
    "sensor_update_interval": 2,
    "max_resend": 5,
    "connection_timeout": 30,
    "sleep_timeout": 10
  }
}
timpur commented 6 years ago

Try 128kb spifis

Also what does the esp tell you the error is with the config at boot ? It should spit something out about the config error ?

jmozmoz commented 6 years ago

I change the SPIFFS size to 128 kB, but it still does not work.

build_flags = -DASYNC_TCP_SSL_ENABLED=1 -Wl,-Teagle.flash.1m128.ld
lib_deps =
  https://github.com/timpur/homie-esp8266.git#v2.1

Here is the output (of the bare minimum example):

↻ Config file loading
βœ– Config file faild to load
βœ– Config file too big
Configuration invalid. Using CONFIG MODE
πŸ’‘ Firmware bare-minimum (1.0.5)
πŸ”Œ Booting into config mode πŸ”Œ
Device ID is dc4f2211e3dc
AP started as Homie-dc4f2211e3dc with IP 192.168.123.1
Homie button is enabled
Triggering Wi-Fi scan...
βœ” Wi-Fi scan completed
jmozmoz commented 6 years ago

Ok, now things get strange: I went back to the default SPIFFS size. Now, the output is:

βœ– Config file faild to load
βœ– Config file too big
Configuration invalid. Using CONFIG MODE
πŸ’‘ Firmware bare-minimum (1.0.5)
πŸ”Œ Booting into config mode πŸ”Œ
Device ID is dc4f2211e3dc
AP started as Homie-dc4f2211e3dc with IP 192.168.123.1
Homie button is enabled
Triggering Wi-Fi scan...
βœ” Wi-Fi scan completed

If I remove the line with "ssl" from the config.json, the configuration seems to be ok.

But both

 "ssl": true,

and

 "ssl": false,

result in the error message above.

jmozmoz commented 6 years ago

I found the problem with the config: In this branch, the config limit is still 500 characters, which was just passed by adding the line above: https://github.com/timpur/homie-esp8266/blob/v2.1/src/Homie/Limits.hpp#L6

const uint16_t MAX_JSON_CONFIG_FILE_SIZE = 500;

In the original respository, it is 1000.

This is the commit, when it was changed: https://github.com/timpur/homie-esp8266/commit/37f5b62851a90d356a60eef7887410bf7dd5d9fe#diff-5f8411c937cdd2d3e98c0cde5993ab8bL6

jmozmoz commented 6 years ago

Ok, now that SSL is working using the config (alone), the orignal problem still persists: The initiating of an OTA just results in dis- and re-connect of the MQTT broker, but no output about OTA.

Setting "ssl": false in config.json results in successful OTA.

jmozmoz commented 6 years ago

I tried to look into this further and added some debug output:

void BootNormal::_onMqttMessage(char* topic, char* payload, AsyncMqttClientMessageProperties properties, size_t len, size_t index, size_t total) {
  if (total == 0) return;  // no empty message possible

  Interface::get().getLogger() << F("topic: ") << topic << endl;
bool BootNormal::__handleOTAUpdates(char* topic, char* payload, const AsyncMqttClientMessageProperties& properties, size_t len, size_t index, size_t total) {
  Interface::get().getLogger() << _mqttTopicLevelsCount << endl;
  Interface::get().getLogger() << topic << endl;
  Interface::get().getLogger() << _mqttTopicLevels.get()[0] << endl;
  Interface::get().getLogger() << _mqttTopicLevels.get()[1] << endl;

Working OTA over non-SSL:

topic: homie/bare/$implementation/ota/firmware/a915eabf0757c1ca6d526bb5c2b01641
5
homie/bare
bare
$implementation
test2
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
test6
Firmware is binary
Receiving OTA firmware (1382/476112)...
topic: homie/bare
5
homie/bare
bare
$implementation
test2
Receiving OTA firmware (2842/476112)...
topic: homie/bare

Non working update over SSL:

Triggering WIFI_CONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
topic: homie/bare/$implementation/reset
3
homie/bare
bare
$implementation
topic: homie/bare/$implementation/restart
3
homie/bare
bare
$implementation
βœ” MQTT ready
Triggering MQTT_READY event...
Calling setup function...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 90%
  β€’ Uptime: 6s
βœ– MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
topic: homie/bare/$implementation/reset
3
homie/bare
bare
$implementation
topic: homie/bare/$implementation/restart
3
homie/bare
bare
$implementation
βœ” MQTT ready
Triggering MQTT_READY event...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 86%
  β€’ Uptime: 45s

So the OTA message is lost before _onMqttMessage()

timpur commented 6 years ago

Okay seems I'll need to do some more testing ssl before releasing :P thanks for debugging

jmozmoz commented 6 years ago

Is it possible, that the OTA message is lost by the MQTTClient library, if it is tranfered via SSL, e.g. due to some size limits? I can see, that the message is sent out by the broker (using mosquitto_sub). But it looks like, it is not received by the MQTT client on the ESP-01. I put a debug output in AsyncMqttClient::_onData and it looks like that there is also no message received there. But if I send a shorter message via mosquitto_pub, it is received by the ESP-01:

mosquitto_pub -h aaa -p 1883  -u yyy -P 'xxxx' -t 'homie/bare/$implementation/ota/firmware/test' -m test
testB
topic: homie/bare/$implementation/ota/firmware/test
5
homie/bare
bare
$implementation
test2
Receiving OTA payload
βœ– OTA failed (400 BAD_CHECKSUM)
Triggering OTA_FAILED event...
βœ– Aborting, invalid MD5

I can even see this, when using mosquitto_pub.

Without SSL, this works (the ESP-01 receives messages):

dd if=/dev/zero of=test_load.txt count=70000 bs=1
mosquitto_pub -h aaa  -p 1883  -u xxx -P zzz -t 'homie/bare/$implementation/ota/firmware/test' -f test_load.txt

With SSL, the MQTT connection is lost and re-established as above.

The message limit seems to bee somewhere between 14000 and 15000 bytes.

timpur commented 6 years ago

The processing method is different for config JSON and OTA, one needs to be entirely loaded into memory before processed and the other, OTA can be processed packet at a time.

Thus there is a limit on for config and other stuff but shouldn't be one for OTA.

jmozmoz commented 6 years ago

The processing method is different for config JSON and OTA, one needs to be entirely loaded into memory before processed and the other, OTA can be processed packet at a time.

Thus there is a limit on for config and other stuff but shouldn't be one for OTA.

Sorry, but I do not understand this: At the moment it looks like that message sent by https://github.com/marvinroger/homie-esp8266/blob/develop/scripts/ota_updater/ota_updater.py is too big to be received by the ESP-01, if sent over SSL/TLS. So does this mean, the script has to be changed to sent the OTA in smaller chunks? As written above, the message is lost somewhere in the MQTT library (before it can be processed by the OTA handler).

TuxCoder commented 6 years ago

If i get it right, you have to publish your update to homie/bare/$implementation/ota/firmware/md5sum file.bin

on an sonoff basic I get the following error

Receiving OTA payload ↕ OTA started Triggering OTA_STARTED event... Firmware is binary βœ– OTA failed (400 NOT_ENOUGH_SPACE) Triggering OTA_FAILED event...

What I remember I get ride of this issue by tweaking the device partition. Will provide a update next week.

timpur commented 6 years ago

It should be possible to do ota on 1 MB how big is your sketch when built ?

Please also see http://marvinroger.github.io/homie-esp8266/docs/2.0.0/others/ota-configuration-updates/

TuxCoder commented 6 years ago

470400B .pioenvs/sonoff_basic/firmware.bin but in the default settings it uses the eagle.flash.1m256.ld so its a bit to big. changed to eagle.flash.1m64.ld Then the space is enought.

Then it crashes for a random reason.

Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16296/470400)...

Exception (3):
epc1=0x401003e9 epc2=0x00000000 epc3=0x00000000 excvaddr=0x40050669 depc=0x00000000

ctx: sys 
sp: 3ffffc60 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffe00:  00000000 3ffe8b9a 00000004 40100690  
3ffffe10:  4024d435 00000003 000016d0 fffffffc  
3ffffe20:  00000046 00000014 3fff40c4 401004d8  
3ffffe30:  3fff4074 3fff40a4 3fff3ebc 40106dc8  
3ffffe40:  00000000 600011f0 3fff40c4 402550dc  
3ffffe50:  00000000 3fff40a4 3fff3dd4 4020732d  
3ffffe60:  3fff1c6c 00000370 00000370 3fff2258  
3ffffe70:  00000000 8e3e0000 3fff40c4 4025623d  
3ffffe80:  3fff0574 0000418a 3fff40c4 4025674b  
3ffffe90:  3fff40a4 3fff40c4 3fff0574 3fff2258  
3ffffea0:  00000000 0000418a 3fff40c4 4025683a  
3ffffeb0:  0000418a 3fff2418 00000006 3fff2258  
3ffffec0:  3fff40c4 3fff2260 3fff225c 3fff2258  
3ffffed0:  3fff40c4 3fff2260 3fff225c 4025b929  
3ffffee0:  000005b4 ad6b1fac 00000010 000005b4  
3ffffef0:  00000000 00000010 3fff0a20 40106dc8  
3fffff00:  40250000 00000000 0000007d 3fff2410  
3fffff10:  3ffeb69a 3fff2418 3fff40a4 40258b81  
3fffff20:  3fff2178 3fff29c4 3fff29c4 3ffef3c0  
3fffff30:  00000000 3fff40a4 0000001c 3fff29c4  
3fffff40:  3ffeb68c 00000000 3fff40a4 40257f81  
3fffff50:  2a6b1fac 00000612 00000000 0000002e  
3fffff60:  00000002 0000001a 402468df 3ffee1e0  
3fffff70:  3ffeb664 3fffdcc0 3ffea7e0 3ffea7e0  
3fffff80:  40246852 3ffee1e0 00000000 3fff2a2c  
3fffff90:  3fffdc80 00000000 3fff40a4 4024cdcf  
3fffffa0:  40000f49 3fffdab0 3fffdab0 40000f49  
<<<stack<<<
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16296/470400)...
Receiving OTA firmware (32680/470400)...
Receiving OTA firmware (49064/470400)...
Receiving OTA firmware (65448/470400)...
Receiving OTA firmware (81832/470400)...
Receiving OTA firmware (98216/470400)...
Receiving OTA firmware (114600/470400)...
Receiving OTA firmware (130984/470400)...
Receiving OTA firmware (147368/470400)...
Receiving OTA firmware (163752/470400)...
Receiving OTA firmware (180136/470400)...
Receiving OTA firmware (196520/470400)...
Receiving OTA firmware (212904/470400)...
Receiving OTA firmware (229288/470400)...
βœ– MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ” MQTT ready
timpur commented 6 years ago

Can you decode one of the stack traces ? https://github.com/me-no-dev/EspExceptionDecoder if using pio use the cmd version (at bottom of page)

jmozmoz commented 6 years ago

In my tests, there is no stack trace. The firmware continuous to run (see comment 3 above).

It looks like the message is drop somewhere AsynMqttClient.

TuxCoder commented 6 years ago

thanks for the links!

Was a bit busy here ist the stack trace from the exception above:

0x40100690: free at ??:?
0x4024d435: system_pp_recycle_rx_pkt at ??:?
0x401004d8: malloc at ??:?
0x40106dc8: pvPortMalloc at ??:?
0x402550dc: pbuf_alloc at core/pbuf.c:366
0x4020732d: tcp_ssl_read at ??:?
0x4025623d: tcp_output_alloc_header at core/tcp_out.c:97
0x4025674b: tcp_send_empty_ack at core/tcp_out.c:842
0x4025683a: tcp_output at core/tcp_out.c:925
0x4025b929: tcp_input at core/tcp_in.c:423
0x40106dc8: pvPortMalloc at ??:?
0x40250000: wifi_softap_set_default_ssid at ??:?
0x40258b81: ip_input at core/ipv4/ip.c:558
0x40257f81: ethernet_input at netif/etharp.c:1377
0x402468df: pp_tx_idle_timeout at ??:?
0x40246852: pp_tx_idle_timeout at ??:?
0x4024cdcf: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0

Will post more stack trace soon, because I don't think it is always the same error.

TuxCoder commented 6 years ago
Firmware is binary
Receiving OTA firmware (16300/470400)...
Receiving OTA firmware (32684/470400)...
Receiving OTA firmware (49068/470400)...
Receiving OTA firmware (65452/470400)...
Receiving OTA firmware (81836/470400)...
Receiving OTA firmware (98220/470400)...
Receiving OTA firmware (114604/470400)...
Receiving OTA firmware (130988/470400)...
Receiving OTA firmware (147372/470400)...
Receiving OTA firmware (163756/470400)...

Exception (3):
epc1=0x401003e9 epc2=0x00000000 epc3=0x00000000 excvaddr=0x40045a09 depc=0x00000000

ctx: sys 
sp: 3ffffc60 end: 3fffffb0 offset: 01a0

>>>stack>>>

0x40100690: free at ??:?
0x4024d435: system_pp_recycle_rx_pkt at ??:?
0x401004d8: malloc at ??:?
0x40106dc8: pvPortMalloc at ??:?
0x402550dc: pbuf_alloc at core/pbuf.c:366
0x4020732d: tcp_ssl_read at ??:?
0x4025623d: tcp_output_alloc_header at core/tcp_out.c:97
0x4025674b: tcp_send_empty_ack at core/tcp_out.c:842
0x4025683a: tcp_output at core/tcp_out.c:925
0x4025b929: tcp_input at core/tcp_in.c:423
0x40106dc8: pvPortMalloc at ??:?
0x40250000: wifi_softap_set_default_ssid at ??:?
0x40258b81: ip_input at core/ipv4/ip.c:558
0x40257f81: ethernet_input at netif/etharp.c:1377
0x402468df: pp_tx_idle_timeout at ??:?
0x40246852: pp_tx_idle_timeout at ??:?
0x4024cdcf: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
<<<stack<<<
TuxCoder commented 6 years ago
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16300/470400)...
βœ– MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...

Exception (0):
epc1=0x401f8216 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys 
sp: 3ffffdd0 end: 3fffffb0 offset: 01a0

>>>stack>>>
Exception Cause: 0  [Illegal instruction]

0x401f8216: ?? ??:0
0x4024d191: ets_timer_handler_isr at ??:?
0x4024d1d6: ets_timer_handler_isr at ??:?
0x4024cdcf: ets_snprintf at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0

<<<stack<<<

This is the source/binary from all above stacktraces from me.

source: https://git.o-g.at/esp8266/homie enviroment: sonoff_basic

(old self patched tls. but very simple and basic patch ) Homie Source: https://github.com/TuxCoder/homie-esp8266

firmware.elf.zip

TuxCoder commented 6 years ago

@jmozmoz in your last try, you have used the false topic, see first post from me. That is the reason for false md5sum.

jmozmoz commented 6 years ago

@TuxCoder do you mean this comment: https://github.com/marvinroger/homie-esp8266/issues/532#issuecomment-402569306

This was just a test to check if the size of a message was a problem. For the firmware updates, I use the ota_updater.py from https://github.com/marvinroger/homie-esp8266/tree/develop/scripts/ota_updater

c:\Users\joachim\workspace\tor\ota>python ota_updater_github.py  -l alpha -p 8883 -u aaa -d "bbb" -i bare c:\Users\joachim\Documents\PlatformIO\Projects\test_esp01\.pioenvs\esp01_1m\firmware.bin --broker-tls-cacert ca.crt
Connecting to mqtt broker alpha on port 8883
Connected with result code 0
Waiting for device to come online...
Waiting for device info...
Publishing new firmware with checksum 9b9ae9c94712d5281d7ab640c5511226
Expecting checksum 9b9ae9c94712d5281d7ab640c5511226, got 5f857d1629bf85e8a15650d294bca80b, update failed!
TuxCoder commented 6 years ago

hmm, from the uart debug messages, It looks like your devices haven't received any update message.

I testest my stuff with mosquitto_pub --capath /etc/ssl/certs -h .... -p 8883 -u switch6 -P 123456 -t "/devices/switch6/\$implementation/ota/firmware/b8a64dbf82e16d1e0cfb0754a4ce4fce" -f .pioenvs/sonoff_basic/firmware.bin

TuxCoder commented 6 years ago

tried the branch from timpur with ssl support. (v2.1) result:

Sometimes same error as jmozmoz, so no log entry about OTA, but a mqtt disconnect.

And this stacktrace, after a few tries.

↻ Config file loading
βœ” Config file loaded and validated
πŸ’‘ Firmware sonoff (1.0.0)
πŸ”Œ Booting into normal mode πŸ”Œ
πŸ”’ SSL is: enabled
Homie button is enabled
{} Stored configuration
  β€’ Hardware Device ID: 6001943ec3c1
  β€’ Device ID: switch5
  β€’ Name: switch5
  β€’ Device Stats Interval: 60 sec
  β€’ Wi-Fi: 
    β—¦ SSID: xxxxxx
    β—¦ Password not shown
  β€’ MQTT: 
    β—¦ Host: xxxxxx
    β—¦ Port: 8883
    β—¦ SSL Enabled?: yes
    β—¦ Base Topic: /devices/
    β—¦ Auth? yes
    β—¦ Username: switch5
    β—¦ Password not shown
  β€’ OTA: 
    β—¦ Enabled? yes
↕ Attempting to connect to Wi-Fi...
βœ” Wi-Fi connected, IP: xxxxxxxxx
Triggering WIFI_CONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ” MQTT ready
Triggering MQTT_READY event...
Calling setup function...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 80%
  β€’ Uptime: 6s
βœ– MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ” MQTT ready
Triggering MQTT_READY event...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 74%
  β€’ Uptime: 7s
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16300/477088)...
Receiving OTA firmware (32684/477088)...
Receiving OTA firmware (49068/477088)...
Receiving OTA firmware (65452/477088)...
Receiving OTA firmware (81836/477088)...
Receiving OTA firmware (98220/477088)...
Receiving OTA firmware (114604/477088)...
Receiving OTA firmware (130988/477088)...
Receiving OTA firmware (147372/477088)...
Receiving OTA firmware (163756/477088)...
Receiving OTA firmware (180140/477088)...
Receiving OTA firmware (196524/477088)...
Receiving OTA firmware (212908/477088)...
Receiving OTA firmware (229292/477088)...
Receiving OTA firmware (245676/477088)...
Receiving OTA firmware (262060/477088)...

Exception (3):
epc1=0x401000b7 epc2=0x00000000 epc3=0x00000000 excvaddr=0x4005b26c depc=0x00000000

ctx: sys 
sp: 3ffffbd0 end: 3fffffb0 offset: 01a0

Exception Cause: 3  [LoadStoreError: Processor internal physical address or data error during load or store]

0x401000b7: umm_disconnect_from_free_list at umm_malloc.c:?
0x4005b26c: ?? ??:0
0x40256a06: tcp_output at core/tcp_out.c:990
0x4010020c: _umm_free at umm_malloc.c:?
0x40100690: free at ??:?
0x402074d6: ax_port_read at ??:?
0x4025956c: espconn_sent at app/espconn.c:387
0x4022dac3: x509_new at ssl/x509.c:121 (discriminator 1)
0x40103205: lmacProcessTXStartData at ??:?
0x40100eb6: pp_post at ??:?
0x401042bb: lmacRxDone at ??:?
0x40100eb6: pp_post at ??:?
0x4022df98: pgm_read_byte at ssl/os_port.h:107
 (inlined by) memcpy_P at ssl/os_port.h:134
 (inlined by) x509_print at ssl/x509.c:647
0x401021e6: wDev_ProcessFiq at ??:?
0x402072b5: tcp_ssl_read at ??:?
0x4010020c: _umm_free at umm_malloc.c:?
0x40100690: free at ??:?
0x40101f80: wDev_ProcessFiq at ??:?
0x40221fdb: AsyncWebServerRequest::_handleUploadByte(unsigned char, bool) at ??:?
0x40210000: HomieInternals::ExponentialBackoffTimer::check() at ??:?
0x4025c330: ram_ana_inf_gating_en at ??:?
0x40222004: AsyncWebServerRequest::_handleUploadByte(unsigned char, bool) at ??:?
0x4025cfac: test_tout at ??:?
0x40106e28: xPortWantedSizeAlign at ??:?
0x40220000: StreamString::read() at ??:?
0x4025a235: espconn_udp_sent at app/espconn_udp.c:141
0x40259635: espconn_recv at app/espconn.c:447
0x40247f8b: wDev_Enable_Beacon_Tsf at ??:?
0x40247efe: wDev_Option_Init at ??:?
0x4024e483: system_relative_time at ??:?
0x40000f49: ?? ??:0
0x40000f49: ?? ??:0
TuxCoder commented 6 years ago

One time a freeze, followed by a reboot

βœ” MQTT ready
Triggering MQTT_READY event...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 78%
  β€’ Uptime: 12s
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (16300/477088)...
Receiving OTA firmware (32684/477088)...

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)
jmozmoz commented 6 years ago

I tried using mosquitto_pub but the result is the same.

βœ– MQTT disconnected
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ” MQTT ready
Triggering MQTT_READY event...
γ€½ Sending statistics...
  β€’ Wi-Fi signal quality: 90%
  β€’ Uptime: 617s

If I split the firmware to 10000 byte sized junks, the OTA process starts, but of course stops due to a wrong MD5SUM.

split -b 10000 firmware.bin
mosquitto_pub -h alpha  -p 8883 --cafile /etc/mosquitto/ca_certificates/ca.crt -u aaa -P bbb  -t 'homie/bare/$implementation/ota/firmware/9b9ae9c94712d5281d7ab640c5511226' -f xaa
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
βœ– OTA failed (400 BAD_FIRMWARE)
Triggering OTA_FAILED event...

So I still think, the a too big message is droped somewhere in AsyncMqttClient.

TuxCoder commented 6 years ago

It is very interesting, that if I try it multiple times, sometimes the OTA starts but gets hang up later with a freez or stacktrace.

Will try to setup gdb, but need a bit time for that. Going to write if I have some results.

jmozmoz commented 6 years ago

Perhaps this is a similar issue: https://github.com/esp8266/Arduino/pull/4980

jmozmoz commented 6 years ago

In https://github.com/marvinroger/homie-esp8266/issues/530#issuecomment-424521765 the LWIP options are mention. So I tried if one of them helps:

timpur commented 6 years ago

@jmozmoz for PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY that sad :(

Hopefully soon ill find time to have a look