xoseperez / espurna

Home automation firmware for ESP8266-based devices
http://tinkerman.cat
GNU General Public License v3.0
3k stars 638 forks source link

Telnet OTA command is broken #2562

Closed Sag06 closed 1 year ago

Sag06 commented 1 year ago

Device

Sonoff Mini

Version

ESPURNA_MINIMAL_ARDUINO_OTA dev 220212

Bug description

ESPURNA_MINIMAL_ARDUINO_OTA today dev compilation. Telnet connection is OK. But OTA command doesn't connect to http server. Tested on Sonoff mini three times. Compiled and esptool flashed. After that, I flashed my last compilation, from may 19th and it works flawlessly.

Steps to reproduce

No response

Build tools used

Arduino ide 1.8.19 Putty 0.74 HFS 2.3M

Any relevant log output (when available)

No response

Decoded stack trace (when available)

No response

mcspr commented 1 year ago

What's the server it fails to connect to? Does it work after changing http client type? In general .h, /#define OTA_CLIENT/ and try changing it to httupdate variant

Sag06 commented 1 year ago

I did all tasks as always. Same server. HFS program on Windows 10. Running on same Notebook. In fact, old compilations work fine here. Last one is from may 2022. Did anything change in general.h from may 2022 to now? I never changed this file.

mcspr commented 1 year ago

I have no idea what version is may 2022 though, git commit is the only way to determine build tree contents with certainty (and why I always ask about it in the issue template, but so far it have not proved to be very effective :) Updating with ota form latest version to latest version minimal .bin works for me w/ https://github.com/caddyserver/caddy

mcspr commented 1 year ago

Does it immediately disconnect after the command? Perhaps there is a crash output / info telling us that there was a crash? And by 'does not connect', what is ota command output (if any)? Is it trying to open a domain name or an IP address? (i.e. ota http://blah.lan/my-firmware.bin.gz)

Sag06 commented 1 year ago

Now, I tried from same Sonoff Mini flashed with compillation "full". I mean, Web UI, home assistant, mqtt, etc. Same results, ota command doesn't work.

[026344] [TELNET] Connected 10.12.16.188:51759 [060079] [MAIN] Resetting stability counter [300163] [MAIN] Uptime: 00y 00d 00h 05m 00s [300164] [MAIN] Heap: initial 41200 available 22984 contiguous 21080 [300165] [MAIN] VCC: 3499 mV [300166] [MAIN] Datetime: 2022-12-03 17:28:10 [471941] [WEBSOCKET] #1 disconnected [600177] [MAIN] Uptime: 00y 00d 00h 10m 00s [600178] [MAIN] Heap: initial 41200 available 24480 contiguous 22856 [600179] [MAIN] VCC: 3499 mV [600180] [MAIN] Datetime: 2022-12-03 17:33:10 ota http://10.12.16.188:8080/espurna.lightcore.jemmwifi.190522.bin [900191] [MAIN] Uptime: 00y 00d 00h 15m 00s [900192] [MAIN] Heap: initial 41200 available 24480 contiguous 22856 [900193] [MAIN] VCC: 3499 mV [900193] [MAIN] Datetime: 2022-12-03 17:38:10

Sag06 commented 1 year ago

I tried with another Sonoff Mini devices, compilations from january and may, and ota command works like a charm. I didn't change anything in hardware. Always

define OTA_CLIENT OTA_CLIENT_ASYNCTCP // Terminal / MQTT OTA support

I think something is broken now. You are right with git commits. I downloaded zip and compiled. I don't know how to identify commit it was then with zip file. I should learn more about github. Sorry.

mcspr commented 1 year ago

So, zero [OTA] ERROR: ...something wrong here... some time (15-30sec) after ota command? Is SDK version the same in both old and new builds? (in WebUI or terminal info ... starting with sdk: ...)

Sag06 commented 1 year ago

Not working, yesterday compilation: SDK version 2.2.2-dev(38a443e) Core version 3.0.2

Working: SDK version 2.2.2-dev(38a443e) Core version 2.7.4

Sag06 commented 1 year ago

So, zero [OTA] ERROR: ...something wrong here... some time (15-30sec) after ota command? Is SDK version the same in both old and new builds? (in WebUI or terminal info ... starting with sdk: ...)

No error. It remains without output. After >3 min, extra info again: ota http://10.12.16.188:8080/espurna.lightcore.jemmwifi.190522.bin [100371] [MAIN] Uptime: 00y 00d 01h 25m 00s [100373] [MAIN] Heap: initial 41200 available 23712 contiguous 21632 [100374] [MAIN] VCC: 3499 mV [100374] [MAIN] Datetime: 2022-12-03 18:48:10

mcspr commented 1 year ago

Going back to 2.7.4 might help

Other issue I have in mind is Windows... But, you'll need to fetch https://github.com/earlephilhower/esp-quick-toolchain/releases/download/3.1.0-gcc10.3/x86_64-w64-mingw32.xtensa-lx106-elf-e5f9fec.220621.zip and c/p replace C:\Users\YOU\AppData\Local\Arduino15\packages\esp8266\tools\xtensa-lx106-elf-gcc\3.0.4-gcc10.3-1757bed contents with the archive ones (just verify that calling bin/xtensa-lx106-elf-gcc -v in that dir shows the same hash as the archive - e5f9fec), then rebuild the .bin

Sag06 commented 1 year ago

Should give any debug info if I run ota command on Espurna debug windows?

Sag06 commented 1 year ago

I don't think is Windows related. I brought bin files compiled from another house/PC. They failed and I compiled again on my notebook to not avail. As OTA with old bin files from may and january work, router, server, etc shouldn't be cause too. I'm lost.

mcspr commented 1 year ago

Have you tried to change toolchain version though? Or downloading releasesnapshot .bin binaries?

Sag06 commented 1 year ago

Last snapshot is from june, isn't it?

mcspr commented 1 year ago

https://github.com/xoseperez/espurna/releases/tag/github221116

Sag06 commented 1 year ago

I,ll try just now

Sag06 commented 1 year ago

espurna-1.15.0-dev.git70d25aea+github221116-espurna-minimal-arduino-ota-1m.bin Same results. It doesn't work. [096530] [TELNET] Connected 10.12.16.188:63023 ota http://10.12.16.188:8080/espurna.mini.190522.bin Time to recover via Esptool :-(

mcspr commented 1 year ago

Have you tried caddy?

caddy file-server --browse  --listen 10.12.16.188:8080
Sag06 commented 1 year ago

Have you tried caddy?

caddy file-server --browse  --listen 10.12.16.188:8080

I'll try. But HFS works with all my old compilations. I don't think it should be problem.

Sag06 commented 1 year ago

Caddy, same results. [556942] [TELNET] Connected 10.12.16.188:52243 ota http://10.12.16.188:8080/espurna.mini.221129.bin

mcspr commented 1 year ago

Our error reporting might be to blame here, OTA needs debug enabled which minimal builds disable. But you should've seen at least this while using normal ones

[788466] [OTA] Connecting to 10.12.16.188:8080
+OK
...
[807057] [OTA] ERROR: Connection aborted
[807057]
[807058] [EEPROM] Enabling EEPROM rotation
[807066] [OTA] Disconnected

(don't really see the reason to not at least generate 'Connecting' message in https://github.com/xoseperez/espurna/issues/2562#issuecomment-1336210890)


Still (strongly) suggest to use the toolchain update regardless of whether it helps OTA or not. This is not some hearsay - just a known bug that will crash our code at random when using Windows builds https://github.com/esp8266/Arduino/pull/8393 (will be synced with boards as soon as Core 3.1.x is released) PIO use it by default already, Arduino IDE installs roll the dice...

Sag06 commented 1 year ago

Not sure if understood. Does windows build mean compiled in a Windows OS machine? Last snapshot failed too. Is it windows build too?

Sag06 commented 1 year ago

Do you suggest to update Espurna using an old minimal bin that works? People should be warned. Two step update is almost mandatory for 1 mega devices.

Sag06 commented 1 year ago

Thinking twice. Snapshots are PIO build, aren't them?

mcspr commented 1 year ago

Two separate issues; toolchain package shipped with ESP8266 Core 3.0.2 is partially broken on Windows for our builds. You are building on Windows, hence my warning.

^ is my log printout on a sonoff-basic .bin downloaded from the releases page, which works for me just fine. You should be able to see OTA logs when using non-minimal builds. -minimal .bin does not include them because of disabled DEBUG_SUPPORT

Sag06 commented 1 year ago

Two separate issues; toolchain package shipped with ESP8266 Core 3.0.2 is partially broken on Windows for our builds. You are building on Windows, hence my warning. I don´t think this is the problem. Snapshot build (github221116) is not Windows build, and it fails too. ^ is my log printout on a sonoff-basic .bin downloaded from the releases page, which works for me just fine. You should be able to see OTA logs when using non-minimal builds. -minimal .bin does not include them because of disabled DEBUG_SUPPORT Ok. I flashed espurna-1.15.0-dev.git70d25aea+github221116-itead-sonoff-mini.bin and tried OTA once more. It doesn't work and NO OTA logs either: [022690] [TELNET] Connected 10.12.16.188:51322 [038258] [NTP] Server pool.ntp.org [038259] [NTP] Last Sync 2022-12-05 19:35:28 (UTC) [038259] [NTP] UTC Time 2022-12-05 19:35:28 ota http://10.12.16.188:8080/espurna.lightcore.jemmwifi.190522.bin [060077] [MAIN] Resetting stability counter [300169] [MAIN] Uptime: 00y 00d 00h 05m 00s [300170] [MAIN] Heap: initial 41840 available 24696 contiguous 22448 [300171] [MAIN] VCC: 3499 mV [300172] [MAIN] Datetime: 2022-12-05 19:39:49 ^C[600176] [MAIN] Uptime: 00y 00d 00h 10m 00s [600178] [MAIN] Heap: initial 41840 available 24496 contiguous 22936 [600178] [MAIN] VCC: 3499 mV [600179] [MAIN] Datetime: 2022-12-05 19:44:49 ota http://10.12.16.188:8080/espurna.lightcore.jemmwifi.190522.bin You can see I tried OTA command twice and no OTA response.

One interesting issue to add. No way to Telnet with password. Connection is always refused in spite password is right. I tried "fibonacci" too to no avail. More detailed:

My full working compilation from last may, sorry to don't know github commit: [418746] [TELNET] Client #0 connected Password: myPassword Password correct, welcome!

And now, Snapshot build (github221116): `Password (disconnects after 1 failed attempt): myPassword

And Putty disconnection error!`

I don't know if this new issue is somehow connected to OTA one. Hope it helps.

mcspr commented 1 year ago

Are you using telnet mode or something like that? Have you tried different clients (netcat, nc)? Do other commands work at all btw? heap, get, set, etc.

Sag06 commented 1 year ago

I always use Putty. Never had any problem. I can Putty Telnet now to all my Espurna devices 2.7.4 Core version.

mcspr commented 1 year ago

I mean, do any other commands work? help, get hostname, or reset edit: on espurna-1.15.0-dev.git70d25aea+github221116-itead-sonoff-mini.bin

Sag06 commented 1 year ago

As I told, I can Telnet (Putty) to 3.0.2 Core on minimal bin or full Espurna bin in case Telnet is set without authentication. Odd.

Sag06 commented 1 year ago

I mean, do any other commands work? help, get hostname, or reset edit: on espurna-1.15.0-dev.git70d25aea+github221116-itead-sonoff-mini.bin

I´ll flash again. Let me one minute

Sag06 commented 1 year ago

No response at all from these commands. `[077751] [TELNET] Connected 10.12.16.188:55845 get hostname [099252] [NTP] Server pool.ntp.org [099252] [NTP] Last Sync 2022-12-05 20:46:07 (UTC) [099253] [NTP] UTC Time 2022-12-05 20:46:07 get hostname

`

mcspr commented 1 year ago

...in favour of a 2nd issue. This is a telnet server problem, not ota

Sag06 commented 1 year ago

It makes sense

mcspr commented 1 year ago

See #2563 Gist of it is we dont handle terminal clients that send data in batches, and expect a single network data packet as cmd\r\n (aka what netcat does) PuTTY sends it as cmd, then \r\n. Not intended to happen, just a badly applied optimization on our side

richtoy commented 1 year ago

OTA still not working for me.

 12/12/2022   16:40.50   /home/mobaxterm  telnet 192.168.1.50 Trying 192.168.1.50... Connected to 192.168.1.50. Escape character is '^]'. [780853] [TELNET] Connected 192.168.1.20:13990 info -ERROR: TERMINAL: UnterminatedQuote info ESPURNA 1.15.0-dev.git312c3ef2+github221212 built 2022-12-12 12:35:56 device: ESPURNA_MINIMAL_ARDUINO_OTA mcu: esp8266 chipid: CC50E3C8C3C3 freq: 80mhz sdk: 2.2.2-dev(38a443e) core: 2.7.4 md5: b205220c4d753117b468415091aad7e0 support: DEBUG_TELNET MDNS ARDUINO_OTA OTA_CLIENT TELNET TERMINAL system: OK boot counter: 1 last reset reason: External System extra info: External System +OK [800146] [MAIN] Uptime: 00y 00d 00h 30m 00s [800147] [MAIN] Heap: initial 50232 available 39688 contiguous 39408 [800148] [MAIN] VCC: 3521 mV ota http://github.com/xoseperez/espurna/releases/download/github221212/espurna-1.15.0-dev.git312c3ef2+github221212-neo-coolcam-power-plug-wifi.bin [806225] [OTA] Connecting to github.com:80 +OK [806261] [EEPROM] Disabling EEPROM rotation [806261] [OTA] Downloading /xoseperez/espurna/releases/download/github221212/espurna-1.15.0-dev.git312c3ef2+github221212-neo-coolcam-power-plug-wifi.bin [806511] [806512] [EEPROM] Enabling EEPROM rotation [806516] [OTA] Disconnected

I always get an error for the first command and OTA silently fails.

Is there a simple way to recover to a working image from this MINIMAL_OTA build?

mcspr commented 1 year ago

^ ota <url> is not the only way to upgrade https://github.com/xoseperez/espurna/wiki/OTA#using-espotapy - -minimal-arduino-ota https://github.com/xoseperez/espurna/wiki/OTA#updating-via-http - minimal-webui (or, just open the :80 port in browser)

for PuTTY, switch to raw protocol instead of telnet (#2563)

richtoy commented 1 year ago

Isn't the PuTTY bug fixed in the latest snapshot? I have flashed a 1.14.1 with ESPURNA_MINIMAL_ARDUINO_OTA from today (2022-12-12) as a first step because there was insufficient memory. Next step is to flash the latest 1.15.0 dev snapshot or revert to 1.14.1. Does the espota.py assume a reset device with the default ip of 192.168.4.1?

Tried with raw too but OTA still silently failed both from MobaXterm and PuTTY

mcspr commented 1 year ago

Isn't the PuTTY bug fixed in the latest snapshot

Only the 'unable to use any command' part of the issue. TELNET in PuTTY is silently sending some extra data before you even start typing anything, causing the 'first command' error. We can't guarantee our workarounds for telnet always work though, so I would suggest to switch to RAW mode instead to avoid any more issues (and testing mostly happens with raw text clients like ncat)

Does the espota.py assume a reset device with the default ip of 192.168.4.1?

espota.py works in both AP and STA modes and it should have the same IP & hostname on the network you used with a normal build

Tried with raw too but OTA still silently failed both from MobaXterm and PuTTY

Command wouldn't have worked anyway. We don't handle redirects and HTTPS support is missing.

> curl -I 'http://github.com/xoseperez/espurna/releases/download/github221212/espurna-1.15.0-dev.git312c3ef2+github221212-neo-coolcam-power-plug-wifi.bin'
HTTP/1.1 301 Moved Permanently
Content-Length: 0
Location: https://github.com/xoseperez/espurna/releases/download/github221212/espurna-1.15.0-dev.git312c3ef2+github221212-neo-coolcam-power-plug-wifi.bin

Not logging that upgrade did not happen is an issue, though

richtoy commented 1 year ago

Not having much success, letting the device boot ESPURNA_MINIMAL_ARDUINO_OTA and then trying espota.py gives the following: -  12/12/2022   20:35.42   /home/mobaxterm  python espota.py --progress --ip 192.168.1.50 --auth XXXXXX --file espurna-1.14.1-neo-coolcam-power-plug-wifi.bin Authenticating...OK 20:36:01 [ERROR]: No response from device

Is there something else that needs to be done to make it accept a new firmware?

 12/12/2022   20:36.01   /home/mobaxterm  telnet 192.168.1.50 Trying 192.168.1.50... Connected to 192.168.1.50. Escape character is '^]'. [556319] [TELNET] Connected 192.168.1.20:60522 info -ERROR: TERMINAL: UnterminatedQuote info ESPURNA 1.15.0-dev.git312c3ef2+github221212 built 2022-12-12 12:35:56 device: ESPURNA_MINIMAL_ARDUINO_OTA mcu: esp8266 chipid: CC50E3C8C3C3 freq: 80mhz sdk: 2.2.2-dev(38a443e) core: 2.7.4 md5: b205220c4d753117b468415091aad7e0 support: DEBUG_TELNET MDNS ARDUINO_OTA OTA_CLIENT TELNET TERMINAL system: OK boot counter: 1 last reset reason: Power On extra info: Power On +OK

With debug: -

 12/12/2022   20:39.49   /home/mobaxterm  python espota.py --progress --debug --ip 192.168.1.50 --auth XXXXXXXX --file espurna-1.14.1-neo-coolcam-power-plug-wifi.bin 20:40:34 [DEBUG]: Options: {'esp_ip': '192.168.1.50', 'host_ip': '0.0.0.0', 'esp_port': 8266, 'host_port': 37575, 'auth': 'XXXXXXXX', 'image': 'espurna-1.14.1-neo-coolcam-power-plug-wifi.bin', 'spiffs': False, 'debug': True, 'progress': True} 20:40:34 [INFO]: Starting on 0.0.0.0:37575 20:40:34 [INFO]: Upload size: 484800 20:40:34 [INFO]: Sending invitation to: 192.168.1.50 Authenticating...OK 20:40:34 [INFO]: Waiting for device... 20:40:44 [ERROR]: No response from device

mcspr commented 1 year ago

Make sure espota.py isn't blocked by firewall, it would listen on :RANDOM_PORT_10000_TO_60000 and device would open a connection there (see 37575 above; --host_port=PORT to override)

I would also caution about 1.15 -> 1.14.1. Anything could happen to our config, it does not expect version downgrades

richtoy commented 1 year ago

Thanks, it was Windows defender. Downgrade worked fine.