opendata-stuttgart / sensors-software

sourcecode for reading sensor data
574 stars 312 forks source link

NRZ-2019-127 broke my sensor #580

Closed skibbipl closed 4 years ago

skibbipl commented 4 years ago

Since latest version my sensor stoped working. I can see on the router (Mikrotik) that it connects to my wifi network but I cannot enter the setup page either sensor sends any data outside. What is strange that since upgrade senor stopped getting ARP entries on the router. I downgraded to NRZ-2018-123B and everything is working again. Any tips how to debug this issue?

dirkmueller commented 4 years ago

Do you see an open wifi access point airrohr-xxxxxx (that being some number) ?

dirkmueller commented 4 years ago

I would really appreciate the serial output of the sensor with the new firmware. You can capture that with screen under macos or Linux or the Arduino ide in general

dirkmueller commented 4 years ago

Did you have the same issue with 2019-125-B1 as well?

dirkmueller commented 4 years ago

you can download and flash the 125 (previous stable release, online between october 31st and december 2nd) from here: https://www.madavi.de/sensor/update/data/previous/NRZ-2019-125-B1/

skibbipl commented 4 years ago

This is what I get from arduino serial port monitor:

22:25:55.487 -> (garbage)Airrohr: NRZ-2019-127-1
22:25:55.659 -> mounting FS...
22:25:55.762 -> opened config file...
22:25:55.762 -> parsed json...
22:25:55.797 -> output debug text to displays...
22:25:56.003 -> Connecting to <redacted_my_wifi_network>
22:25:56.481 -> ...................
22:26:06.243 -> WiFi connected, IP is: 192.168.1.27
22:26:06.243 -> Starting Webserver... 192.168.1.27
22:26:06.276 -> 
22:26:06.276 -> ChipId: 9925909
22:26:06.276 -> Start reading SDS011 version date
22:26:07.632 -> End reading SDS011 version date
22:26:07.667 -> Read SDS...: 18-11-16(ee74)
22:26:07.767 -> Stopping SDS011...
22:26:07.767 -> Read BMP280/BME280...
22:26:07.802 -> Trying BMP280/BME280 sensor on 76 ... found
22:26:07.836 -> Send to :
22:26:07.870 -> sensor.community
22:26:07.870 -> Madavi.de
22:26:07.904 -> custom API
22:26:07.904 -> ----
22:26:07.904 -> Auto-Update active...
22:26:07.939 -> validate request auth...
22:26:07.939 -> ws: root ...

And then it dies :( Update: downgraded to NRZ-2019-125-B1 and everything is fine again.

dirkmueller commented 4 years ago

Since you seem to have a setup to.manually flash.. can you try if manually flashing this firmware works? Please make sure to temporarily turn off auto update in config prior doing that otherwise it will OTA on boot to the previous version.

https://firmware.sensor.community/airrohr/beta/latest_en.bin

There are also other language versions, just pick one that you want.

There is a bugreport against Arduino core that wifi is not working after OTA while it works after manually flashing via serial.

dirkmueller commented 4 years ago

also, I see the message "validate request auth" - that means you have set a password for the webui. are you sure there isn't some "authentication" dialog hidden somewhere in some other browser tab waiting for input?

can you share some details about the user/password you ahve set? length or maybe other special things? I tried this with the user "admin" and the password "rfWSYs82pzVZrHKrfWSYs82pzVZrHKrfWSYs82pzVZrHK" (random string, just long). and that seems to work..

skibbipl commented 4 years ago

Regarding web auth - yes I put some simple 8 char password with one special char from the range: !@#$%^&*()-=_+. And just for a sec after rebooting I'm able to enter webui, but then it stops working. I'll try to manually flash latest version in the evening and I'll report back.

dirkmueller commented 4 years ago

So, just to be clear: the web authentication succeeds and the webpage is loading successfully?

dirkmueller commented 4 years ago

https://github.com/opendata-stuttgart/sensors-software/blob/master/airrohr-firmware/Readme.md#ben%C3%B6tigte-software-in-klammern-getestete-version-und-die-art-der-lizenz

when you configure arduino with the esp8266 integration (you need to add the url to the boardmanager) then you'll have a tool called esptool.py in your %arduinoinstalldir%/Arduino15/./packages/esp8266/hardware/esp8266/2.6.2/tools/esptool/esptool.py

location. this can be used to flash firmware with:

esptool.py  --chip auto --port $port --baud 460800 write_flash -fm dio 0x00000 $firmware.bin
dirkmueller commented 4 years ago

Also, can you please disable the sds011 sensor in configuration menu from a firmware version that works and then update to the newer version (for example by enabling auto-update and "use beta channel" both at the same time). Also, please set debug level to 5 and capture the debug output from serial again, maybe this gives a clue on where it gets stuck.

when it is stable without sds011 we have the first hint of where to look.

skibbipl commented 4 years ago

the web authentication succeeds and the webpage is loading successfully?

Only for a second and then everything dies, also serial debugging stops. I will try in the evening this upgrade with disabled SDS011.

skibbipl commented 4 years ago

Disabled SDS011 and enabled update. Still dies:

12:53:38.867 -> ⸮⸮Found firmware MD5: 40cec42ccbac05f147e486d1e4822704
12:53:39.072 -> 
12:53:44.886 -> Moving Firmware image to old.
12:53:47.090 -> Finished successfully.. Rebooting!
12:53:47.634 -> ?)⸮Lr⸮D⸮(⸮⸮Airrohr: NRZ-2019-128-B2
12:53:53.633 -> mounting FS...
12:53:53.736 -> opened config file...
12:53:53.736 -> parsed json...
12:53:53.771 -> Rewriting old config from: NRZ-2019-125-B1
12:53:53.804 -> Saving config...
12:53:53.976 -> Config written successfully.
12:53:53.976 -> output debug text to displays...
12:53:54.216 -> Connecting to <REDACTED>
12:53:54.664 -> ...................
12:54:04.430 -> WiFi connected, IP is: 192.168.1.27
12:54:04.465 -> Starting Webserver... 192.168.1.27
12:54:04.500 -> 
12:54:04.500 -> ChipId: 9925909
12:54:04.500 -> Read BMP280/BME280...
12:54:04.534 -> Trying BMP280/BME280 sensor on 76 ... found
12:54:04.604 -> Send to :
12:54:04.604 -> sensor.community
12:54:04.604 -> Madavi.de
12:54:04.638 -> custom API
12:54:04.638 -> ----
12:54:04.638 -> Auto-Update active...

Same behavior with disabled BME280.

ricki-z commented 4 years ago

You are sending to a "custom API". Is this API using HTTPS? If yes, is the certificate 2048 Bit or lower?

skibbipl commented 4 years ago

Yes, I use https - standard Let's Encrypt RSA certificate with 2048 bits according to Firefox.

dirkmueller commented 4 years ago

Well, that should not kick in until there's an actual measurement.

Is it actually trying to do a measurement cycle after the 2 minutes or so if uptime if you don't try to access the webui?

dirkmueller commented 4 years ago

see https://github.com/esp8266/Arduino/issues/6886

mika commented 4 years ago

I'm also affected (also have a Mikrotik router in my setup and send data to a local InfluxdDB), downgrading to NRZ-2018-123B from https://www.madavi.de/sensor/update/data/previous/NRZ-2018-123B/ worked for me (tried also 2019-125-B1 but AFAICS I also have troubles there).

dirkmueller commented 4 years ago

the interesting aspect here is that NRZ-2018-123B and 2019-125-B1 use the same Arduino core version (2.4.2) with the same wifi stack. Can you please try the ARP nping with those versions and tell us which one works which one doesn't (and how well)?

dirkmueller commented 4 years ago

also, please compare with a firmware from

https://static.dmllr.de/airrohr/beta/builds-SDK221/

and

https://static.dmllr.de/airrohr/beta/builds-SDK22x_191024/

skibbipl commented 4 years ago

I tried both firmwares with no luck. After booting it dies same way as 127 version.

dirkmueller commented 4 years ago

Thanks. can you try https://static.dmllr.de/airrohr/beta/builds-2019-126-B4/ ?

dirkmueller commented 4 years ago

also, I am not really sure I understand "it dies" correctly. is it merely not responding to http requests, or also not sending data?

can you load the build from https://static.dmllr.de/airrohr/beta/builds-128-B1-debug-alive/ and paste the last few dozen lines of text when it "dies" from serial console? it has wifi debug enabled as well as will print a message multiple times a second.

skibbipl commented 4 years ago

NRZ-2019-126-B4 works fine. Regarding "dies" I mean that after initial boot COM debgging returns nothing. Working version keeps pushing following messages:

18:58:13.166 -> Start reading SDS011
18:58:13.166 -> End reading SDS011
18:58:14.156 -> Start reading SDS011
18:58:14.156 -> End reading SDS011
18:58:15.178 -> Start reading SDS011
18:58:15.178 -> End reading SDS011

Version 128 provides log below (repeated all the time). It seems that after 4 minutes (19:08 - 19:12) it started working OK!

19:08:07.737 -> pm open,type:0 0
19:08:07.944 -> ⸮⸮⸮b⸮L⸮⸮D⸮(?
19:08:08.115 -> SDK:2.2.2-dev(38a443e)/Core:2.6.2=20602000/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-16-ge23a07e/BearSSL:89454af
19:08:08.217 -> Airrohr: NRZ-2019-127-1
19:08:08.251 -> mounting FS...
19:08:08.251 -> scandone
19:08:08.251 -> state: 0 -> 2 (b0)
19:08:08.286 -> state: 2 -> 3 (0)
19:08:08.319 -> state: 3 -> 5 (10)
19:08:08.319 -> add 0
19:08:08.319 -> aid 1
19:08:08.353 -> cnt 
19:08:08.353 -> opened config file...
19:08:08.353 -> 
19:08:08.387 -> connected with <redacted>, channel 9
19:08:08.421 -> dhcp client start...
19:08:08.421 -> wifi evt: 0
19:08:08.455 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:08.490 -> wifi evt: 3
19:08:08.525 -> parsed json...
19:08:08.525 -> output debug text to displays...
19:08:08.560 -> state: 5 -> 0 (0)
19:08:08.595 -> rm 0
19:08:08.595 -> wifi evt: 1
19:08:08.595 -> STA disconnect: 8
19:08:08.630 -> del if0
19:08:08.630 -> usl
19:08:08.630 -> mode : null
19:08:08.664 -> wifi evt: 8
19:08:08.664 -> sleep disable
19:08:08.732 -> mode : sta(12:34:56:78:90:ab)
19:08:08.732 -> add if0
19:08:08.767 -> Connecting to <redacted>
19:08:08.800 -> wifi evt: 8
19:08:09.207 -> .....scandone
19:08:12.475 -> state: 0 -> 2 (b0)
19:08:12.475 -> .state: 2 -> 3 (0)
19:08:12.510 -> state: 3 -> 5 (10)
19:08:12.510 -> add 0
19:08:12.510 -> aid 1
19:08:12.545 -> cnt 
19:08:12.545 -> 
19:08:12.545 -> connected with <redacted>, channel 9
19:08:12.578 -> dhcp client start...
19:08:12.611 -> wifi evt: 0
19:08:12.611 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:12.680 -> wifi evt: 3
19:08:12.680 -> .
19:08:12.680 -> WiFi connected, IP is: 192.168.1.27
19:08:12.680 -> Starting Webserver... 192.168.1.27
19:08:12.714 -> 
19:08:12.749 -> ChipId: 9925909
19:08:12.749 -> Start reading SDS011 version date
19:08:14.085 -> End reading SDS011 version date
19:08:14.119 -> Read SDS...: 18-11-16(ee74)
19:08:14.221 -> Stopping SDS011...
19:08:14.221 -> Read BMP280/BME280...
19:08:14.254 -> Trying BMP280/BME280 sensor on 76 ... found
19:08:14.288 -> Send to :
19:08:14.321 -> sensor.community
19:08:14.321 -> Madavi.de
19:08:14.321 -> custom API
19:08:14.321 -> ----
19:08:22.465 -> pm open,type:0 0
19:08:22.670 -> ?⸮F⸮()⸮DHf⸮
19:08:22.876 -> SDK:2.2.2-dev(38a443e)/Core:2.6.2=20602000/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-16-ge23a07e/BearSSL:89454af
19:08:22.946 -> Airrohr: NRZ-2019-127-1
19:08:22.981 -> mounting FS...
19:08:22.981 -> scandone
19:08:22.981 -> state: 0 -> 2 (b0)
19:08:23.016 -> state: 2 -> 3 (0)
19:08:23.051 -> state: 3 -> 5 (10)
19:08:23.051 -> add 0
19:08:23.051 -> aid 1
19:08:23.086 -> cnt 
19:08:23.086 -> opened config file...
19:08:23.086 -> 
19:08:23.120 -> connected with <redacted>, channel 9
19:08:23.155 -> dhcp client start...
19:08:23.155 -> wifi evt: 0
19:08:23.190 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:23.225 -> wifi evt: 3
19:08:23.260 -> parsed json...
19:08:23.260 -> output debug text to displays...
19:08:23.294 -> state: 5 -> 0 (0)
19:08:23.329 -> rm 0
19:08:23.329 -> wifi evt: 1
19:08:23.329 -> STA disconnect: 8
19:08:23.362 -> del if0
19:08:23.362 -> usl
19:08:23.362 -> mode : null
19:08:23.397 -> wifi evt: 8
19:08:23.397 -> sleep disable
19:08:23.466 -> mode : sta(12:34:56:78:90:ab)
19:08:23.466 -> add if0
19:08:23.500 -> Connecting to <redacted>
19:08:23.533 -> wifi evt: 8
19:08:23.941 -> .....scandone
19:08:27.206 -> state: 0 -> 2 (b0)
19:08:27.206 -> .state: 2 -> 3 (0)
19:08:27.240 -> state: 3 -> 5 (10)
19:08:27.240 -> add 0
19:08:27.240 -> aid 1
19:08:27.274 -> cnt 
19:08:27.274 -> 
19:08:27.274 -> connected with <redacted>, channel 9
19:08:27.309 -> dhcp client start...
19:08:27.343 -> wifi evt: 0
19:08:27.343 -> ip:192.168.1.27,mask:255.255.255.0,gw:192.168.1.1
19:08:27.377 -> wifi evt: 3
19:08:27.377 -> .
19:08:27.377 -> WiFi connected, IP is: 192.168.1.27
19:08:27.410 -> Starting Webserver... 192.168.1.27
19:08:27.444 -> 
19:08:27.479 -> ChipId: 9925909
19:08:27.479 -> Start reading SDS011 version date
19:08:28.812 -> End reading SDS011 version date
19:08:28.847 -> Read SDS...: 18-11-16(ee74)
19:08:28.947 -> Stopping SDS011...
19:08:28.947 -> Read BMP280/BME280...
19:08:28.981 -> Trying BMP280/BME280 sensor on 76 ... found
19:08:29.015 -> Send to :
19:08:29.049 -> sensor.community
19:08:29.049 -> Madavi.de
19:08:29.083 -> custom API
19:08:29.083 -> ----
<cut>
19:12:24.551 -> SNTP sync finished: Tue Dec 10 18:12:12 2019
19:12:24.586 -> 
19:12:24.620 -> Alive! at 6413
19:12:24.620 -> 33240
19:12:24.655 -> Alive! at 6481
19:12:24.655 -> 33240
19:12:24.725 -> Alive! at 6551
19:12:24.725 -> 33240
dirkmueller commented 4 years ago

Ok, this is great news. this means NTP is simply not possible in your wifi setup. We can fix that.

dirkmueller commented 4 years ago

@skibbipl please test if https://static.dmllr.de/airrohr/beta/builds-NRZ-2019-128-B3/ resolves that issue.

skibbipl commented 4 years ago

Looks good:

23:28:49.284 -> output debug text to displays...
23:28:49.515 -> Connecting to <redacted>
23:28:49.957 -> .......
23:28:53.711 -> WiFi connected, IP is: 192.168.1.27
23:28:53.746 -> Starting Webserver... 192.168.1.27
23:28:53.781 -> 
23:28:53.781 -> ChipId: 9925909
23:28:53.781 -> Start reading SDS011 version date
23:28:55.105 -> End reading SDS011 version date
23:28:55.139 -> Read SDS...: 18-11-16(ee74)
23:28:55.273 -> Stopping SDS011...
23:28:55.273 -> Read BMP280/BME280...
23:28:55.306 -> Trying BMP280/BME280 sensor on 76 ... found
23:28:55.340 -> Send to :
23:28:55.374 -> sensor.community
23:28:55.374 -> Madavi.de
23:28:55.374 -> custom API
23:28:55.374 -> ----
23:28:56.366 -> Start reading SDS011
23:28:56.366 -> End reading SDS011

Also as you mentioned NTP I found following info in Mikrotik SNTP Client:

Last Bad Packet From | 192.168.1.27
Last Bad Packet | 04:32:55 ago
Last Bad Packet Reason | server-ip-mismatch
dirkmueller commented 4 years ago

thanks. googling that error message leads to: https://de.scribd.com/document/78877210/NTP-Server-Local-Mikrotik

which could be the reason why you're having issues with NTP?

The problem is without NTP we can not really validate SSL certificates (needed for secure data sending as well as secure PTA) as we have no valid system time. so it is sort of important to get a solution for this :/

skibbipl commented 4 years ago

I have two NTP servers (both on Raspberry Pi) in my local LAN, however for IoT devices I use dedicated wifi hotspot with blocked access to LAN. Perhaps my Mikrotik broadcasts info about NTP servers in LAN but the sensor cannot access them and therefore gets confused?

dirkmueller commented 4 years ago

No, we use hardcoded ntp servers in the internet. It seems by default these routers come with firewall / NAT rules that are intended to do NTP on the router but accidentally also apply to lan packets reaching for NTP, which causes them to be dropped . That's how I read the description above.

Anyway, thanks a ton for your help in chasing this down!

mika commented 4 years ago

@dirkmueller sorry for the delay on my side but I wasn't in front of the device any longer back then and busy with other stuff. Great debugging and impressive turnaround time for the fix, thanks! :+1:

BTW, the issue is closed but I don't see the related fix neither in https://github.com/opendata-stuttgart/sensors-software nor as pending PR here, what's the suggested procedure to get this fix for us (except for using https://static.dmllr.de/airrohr/beta/builds-NRZ-2019-128-B3/)?

dirkmueller commented 4 years ago

The best solution is to work on getting NTP packet routing working in your setup. Without time anything the sensor does including updating itself is insecure because it can not validate certificates.

We need to wait for vacation season to end to get a new beta published. This will not go out as a stable release this year because we have another issue that needs to be fixed before we can do a new rollout.

mika commented 4 years ago

Hm that's interesting, I'm not aware of any problems related to NTP with any other clients in my network. hmmm

Ah ok, thanks for clarification, was just wondering whether anything was forgotten or so. :)