opendata-stuttgart / sensors-software

sourcecode for reading sensor data
577 stars 313 forks source link

Wifi Problems since NRZ-2020-131 #814

Closed Steffeng5 closed 3 years ago

Steffeng5 commented 4 years ago

I wanted to setup some new sensors yesterday and saw, that after configuring my SSID/Password via AP Mode the Airrohr always was rebooting in AP Mode and not connecting to my wifi.

Looking at another "older" sensor, I saw there is a new firmware published und I was not able to connect to 1/2 of my Airrohr-ESPs anymore. The one I cannot connect is sending data, but since the update not so frequently than before and it is not accessible via http. image

wertziop commented 4 years ago

Same here for my esp8266 with SDS011 and DHT22: After firmware update yesterday, there are aprox 10 readings per hour. web interface is mostly unreachable. Stats for the last 15 hours: measurement count: 182 wifi error count: 78 sensor.comunity error count: 192 (!) sds011 error count 78.

The firmware from jan 2020 was quite stable and was running for month without any problems.

Phaze-III commented 4 years ago

Although I don't have any problems with my sensor I've asked a few colleagues running a similar sensor about their experiences. 2 confirmed OK (no Wifi errors, responsive UI, a few (normal) upload errors). 2 reported 'no problem/no measurement gaps'. But I got one report that looks similar to this one, almost impossible to get to the WebUI and many measurement gaps. He was able to get a snapshot of the /status page and the /values page: Screenshot 2020-10-19 at 21 57 40

Note that almost all measurements failed to upload and that in a period of 42 minutes 15 NTP syncs were done (normally once per hour). Number of errors for Sensor.Community is twice as much but that is normal since data is uploaded separately per sensor.

@holgerbohni : do you also see a high number of NTP syncs, comparable to the number of Wifi errors?

The person with these stats also had this on the /values page:

WiFi    Signaalsterkte  31 dBm
WiFi    Signaalkwaliteit    0 %

I've seen this only once during my tests and that most likely was just after a single Wifi-error (the first since the upgrade). I guess 31 is reported if no Wifi-signal was registered.

The positive dBm value of 31 is also stored in the json-data sent to Madavi.de and although most uploads fail some of them get through with a signal:31 value since the upgrade. @ricki-z: this might be something to look for on Madavi. I somehow think that in this case before almost every measurement the wifi connection is initialised again including a NTP sync.

The person having these problems has one thing that might be worth mentioning: the sensor is connected to a Fritz!box Mesh network with a Fritz!box 7583 and a Fritz!WLAN repeater 1750E . The other reporters don't have a Mesh network.

dirkmueller commented 4 years ago

@Steffeng5 @Phaze-III can you please check with the firmware from https://static.dmllr.de/airrohr/beta/builds-NRZ-2020-132-B1-wifirevert/ ?

(remember to turn of OTA update in config, otherwise it will be immediately reverted)

wertziop commented 4 years ago

Sorry, no ntp sync count 'cause my device is completely offline now and it takes some time to get it back. (it's a little bit tricky to reach it :) I will check Dirks Firmware later today. Thank you for your support! @Steffeng5 @Phaze-III @dirkmueller

wertziop commented 4 years ago

Here we go: Dirks Firmware NRZ-2020-132-B1/DE is running.

Thanks again.

@Steffeng5 @Phaze-III @dirkmueller

Steffeng5 commented 4 years ago

Currently I have the same status like @holgerbohni

Wifi works good. Indoor no problems. Since 40 minutes outdoor again( with quite bad wifi signal (26% -87dBm, but it's okay, UI is smooth, no errors right now))

Datapoints at the moment are in a stable interval. Will monitor it the next hours and come back with more results. image

Phaze-III commented 4 years ago

@dirkmueller first impression after 45 minutes is good, very responsive UI, no errors, no gaps.

FYI: overnight I have been running a build with some of the B10/11-patches reverted (removed the __noinline additions and re-inserted the yield(). That was also very stable (no errors, no gaps) and had a higher sample rate (31K/s instead of 25K/s).

Steffeng5 commented 4 years ago

Not so amazing at the moment :-/ image

There are also "holes" in the grafana data again.

wertziop commented 4 years ago

Results after 3 hours: The slightly changed position gives a noticeable better reception (+ 3dBm) which seems essential: No errors at all (4 NTP Syncs BTW)

Looks like signal strength above -80 dBm is important.

Steffeng5 commented 4 years ago

Results after 6 hours: Looks good right now. No errors since last screenshot and moving the wifi router 5 cm near to esp :-D Also my 2 "new" esps are working good with the beta version.

Which changes where made in the current stable version regarding the wifi connection? @dirkmueller

wertziop commented 4 years ago

I've had the first 6 errors in a row after 6 hours. Last signal reading was -80dBm, then a couple of wifi and sensor errors occoured. When it starts to rain attenuation increases, which normaly doesn't lead to any problems. At the moments it still looks a little bit wonky.

@Steffeng5 when "gain" means centimeter... :-)

dirkmueller commented 4 years ago

Which changes where made in the current stable version regarding the wifi connection?

It has https://github.com/esp8266/Arduino/pull/7486/files reverted, aka the "forcefully disconnect" on authmode change.

@Steffeng5 am I correct in assuming that you have multiple wifi access points in this essid setup?

dirkmueller commented 4 years ago

@holgerbohni do you have more than one AP (e.g. a wifi range extender/mesh node etc)?

dirkmueller commented 4 years ago

@Phaze-III so reverting the patches increases the sample rate? previously you said those patches helped increasing the sample rate.. This is weird.

so I'm still stunned how we have wifi issues given that the sdk version (which carries the wifi code) did not change between previous and current stable.

wertziop commented 4 years ago

@dirkmueller nope, only a single 2,4GHz AP with a good antenna. Current status: IMG_0011

Phaze-III commented 4 years ago

@dirkmueller

so reverting the patches increases the sample rate? previously you said those patches helped increasing the sample rate.. This is weird.

The earlier reported increase in sample rate was comparing a build of NRZ-2020-130-B9 with only -DFP_IN_IROM with a build of NRZ-2020-130-B11 (at 03fee73302566210dfba7869073aaba8d42c5d99).

After upgrading to NRZ-2020-131 I noticed a decrease in sample rate which I thought was weird but backtracking was caused by 8aa2f465bb574d1e8d5a4f2a54a84d6681967eb8 where one __noinline was added:

-static void fetchSensorPPD(String& s) {
+static __noinline void fetchSensorPPD(String& s) {

So I tried a build with all 4 __noinline's removed which increased the sample rate again.

Yesterday I looked at other changes between B9 and B11 and two (not GPS-related) came up:

@@ -4312,9 +4300,6 @@ void loop(void) {
        }

        sample_count++;
-#if defined(ESP8266)
-       ESP.wdtFeed();
-#endif
        if (last_micro != 0) {
                unsigned long diff_micro = act_micro - last_micro;
                UPDATE_MIN_MAX(min_micro, max_micro, diff_micro);
@@ -4531,7 +4517,6 @@ void loop(void) {
                starttime = millis();                                                           // store the start time
                count_sends++;
        }
-       yield();
 #if defined(ESP8266)
        MDNS.update();
        serialSDS.perform_work();

Putting the yield(); line back slightly decreased the sample/rate again but still higher than NRZ-2020-131. I haven't tested the effect of ESP.wdtFeed();.

yield() might be a candidate given a comment I found in https://github.com/opendata-stuttgart/sensors-software/pull/28#issuecomment-270099925 .

dirkmueller commented 4 years ago

So I tried a build with all 4 __noinline's removed which increased the sample rate again.

This does not match my experience. there is a 50% samples counter improvement for me with __noinline's .

I haven't tested the effect of ESP.wdtFeed();

wdtFeed() feeds the watchdog timer, which resets the node if it isn't regularly called (at least every 2s). the watchdog is reset when loop() exits so that is very unlikely to be any problem.

Putting the yield(); line back slightly decreased the sample/rate again

Right, yield() is a way to pass on control to the wifi stack when loop() doesn't exit soon enough. if you check this is very much near the end of loop() so I think we can give that up.

from my measurements, there are dozen other places where we spend 10-50 times the time before exiting loop (like for example in the webserver stack that returns larger webpages). so those would have to be solved first.

dirkmueller commented 4 years ago

@Steffeng5 please ensure that serial debug is turned off (debug level 0 or 1 in /config).

Does that help with the wifi instability?

dirkmueller commented 4 years ago

okay, everyone, we need to move forward in isolating the issues instead of having confusing side discussions. so I made a couple of firmwares to try. I would like to invite everyone to try out each one of them and report which one works best.

Here are the options:

In addition, other than trying out custom firmwares, these options can be tried:

if any of those drastically improve the situation we know where to start looking. I have no ability to reproduce this. I run a sensor with SDS011, DHT11 and BME280 connected and it works fine. I'm trying iwth two wifi access points. I am currently not able to simulate low-reception situations, but the weather season (dry/humid air) certainly currently is not to our favor for outdoor sensor. reception quality is known to be worse in humid, foggy air situations.

dirkmueller commented 4 years ago

Okay, I think I found something. Let me know if https://static.dmllr.de/airrohr/beta/builds-NRZ-2020-132-B1-sds011rework/ works.

The issue is that the newer arduinocore has a different espsoftwareserial blocking read behavior that we didn't expect.

wertziop commented 4 years ago

Ok, you‘re faster than me and my tests. I‘m running different versions on two devices since yesterday. I will continue with your latest Firmware today. Results so far: multiwifi and wifireverted are both stable without wifi errors, but with a couple of sds011 errors over 12 hours.

wertziop commented 4 years ago

@dirkmueller no errors after 4 hours with „132-B1-sds011rework“ on any of my devices. Looks very good. Update: 9h without errors! 👍 Update: 13h, 2 devices, 0 errors :-)

Phaze-III commented 4 years ago

@dirkmueller : NRZ-2020-132-B1-sds011rework has been running very stable for more than 6 hours on the sensor of the person with the Fritx!Box Mesh network, only 1 WiFi-error. In his setup NRZ-2020-131 and the other 4 trial-builds didn't work. So you definitely found something 👍

Screenshot 2020-10-22 at 15 22 16
dirkmueller commented 4 years ago

of the person with the Fritx!Box Mesh network, only 1 WiFi-error.

so the wifi disconnect was WIFI_DISCONNECT_REASON_ASSOC_LEAVE, aka the wifi mesh reconnected the client to a different endpoint. so thats not really an "error", just normal behavior.

There is however an issue visible in this screenshot, the "SDS011" version colum is empty. so it failed to read the information from the node on boot. also there are two SDS011 errors. which isn't a lot, but it is more than I'd like.

None of the changes that I did should however affect that.

still interested in more feedback from others potentially.

Phaze-III commented 4 years ago

There is however an issue visible in this screenshot, the "SDS011" version column is empty.

A restart fixed that: Screenshot 2020-10-22 at 21 52 10

Other screenshots I received from him also showed the version string so this was most likely a glitch.

Steffeng5 commented 4 years ago

I will try the Version tomorrow and give you feedback!

dirkmueller commented 4 years ago

The changes landed in B2, which is now online in the beta channel. You can also do an 'use beta' ota instead.

Please report if there are issues remaining, otherwise I assume it's fixed.

wertziop commented 4 years ago

@dirkmueller thank you so much! BTW: no errors over 48h.

Steffeng5 commented 4 years ago

@dirkmueller Thanks! That makes it much easier for me, that I do not have to uninstall the airrohr again to flash it :-) Just updated to beta channel.

Steffeng5 commented 4 years ago

Just visualized the sample rate calculated down per second (group by 15m / 15 / 60) and labeled it by version: image

Now waiting some hours for the current beta to get results

Phaze-III commented 4 years ago

@dirkmueller

also there are two SDS011 errors. which isn't a lot, but it is more than I'd like.

For the record: after the reboot the sensor attached to the Fritz!box Mesh has been running fine for two days. A snapshot of the status-page after 21 hours: Screenshot 2020-10-24 at 13 23 40

Only one WiFi error and 1 SDS011 error.

wertziop commented 4 years ago

@dirkmueller on my devices your „sds011rework“ firmware performen 600 measurements without an error.

Phaze-III commented 4 years ago

@holgerbohni : I can confirm the lower sample rate of 132-B2 (OTA) compared to the sds011rework build. Same numbers on my own sensor. However I didn't see any Wifi-errors after the first time I did an OTA upgrade to 132-B2 (test duration ~ 10 hours).

This evening I did a second OTA upgrade and after that one I got a relatively unresponsive UI and indeed a few Wifi-errors (4 in 1 hour). After a soft reset the UI was responsive again and up until now no Wifi-errors. You might want to try a reset.

wertziop commented 4 years ago

@Phaze-III after soft reset the error rate increases. After 30min

Wifi: 1/-85/200 Sds011: 1

After power cycling almost same situation: wifi and sds011 errors are back.

Update after 3 h (screenshot) screenshot_airrohr

Phaze-III commented 4 years ago

The results of a comparison of Dirk's sds011rework and the OTA build of NRZ-2020-132-B2 on the sensor attached to the Fritx!box Mesh also shows a big difference . The OTA build performs as bad as the NRZ-2020-131 release build while the sds011rework build has been running happily for at least three days:

builds-NRZ-2020-132-B1-sds011rework/latest_nl.bin Sample rate 35K/s

Screenshot 2020-10-26 at 14 10 47

OTA NRZ-2020-132-B2/NL Sample rate 25 K/s

Screenshot 2020-10-26 at 14 14 49

So the question is what was different in Dirk's sds011rework build?

Phaze-III commented 4 years ago

I compiled two tables with the test results and parameters that appear to have some influence for my own sensor and the Fritz!Box Mesh sensor.

Included in the table are the results of tests to see whether the sds011rework patch would have helped in the bad performance of NRZ-2020-130-B9/B10 for my sensor (see #789). It appears that that wouldn't have made a difference :-( In that case the addition of -DFP_IN_IROM in B11 made the difference for my sensor.

Next set of tests was to build NRZ-2020-131 and NRZ-2020-132-B2 without -DFP_IN_IROM and have them tested on the problematic Fritzbox!Mesh sensor. That resulted in a good performance for NRZ-2020-132-B2 (no errors and responsive UI) and a slight improvement for NRZ-2020-131 (no Wifi errors but unresponsive UI). So for NRZ-2020-131 and NRZ-2020-132-B2 moving the FP-routines back to IRAM seems to help.

Another set of tests was to see how firmware built with the Arduino IDE would perform (with settings as close to the PlatformIO settings as possible, see https://github.com/opendata-stuttgart/sensors-software/issues/789#issuecomment-703146144). On both my sensor and the Fritzbox!Mesh sensor that appears to give good/better performance for versions that perform badly when built with PlatformIO. Now this is just an observation, I really don't want to start a discussion about which IDE to use. But a cursory look at the build-logs indicates that there are differences in how the various object (.o) and archive (.a) files are built and linked together. It might be an idea to do some research on what those differences are and to check if there are options to tune the PlatformIO builds.

As said, I just want to report my observations hoping that they may help in finding either the cause or a workaround for the instability problems.

Test results

Columns used:

Build: Name/Version of the build Source: online (fetched from firmware.sensor.community), OTA (installed via OTA), dmllr.de (test build from Dirk) or local (built by me) IDE: IDE used, PlatformIO or Arduino IDE FP_IN_IROM: whether defined or not in platformio.ini Duration: duration of the test period reported in the table UI: responsiveness of the UI, subjective indication where normal means occasional delays and good means almost always immediate response Sample rate: given in K/s Git Hash: ref to the 'HEAD' of the checkout the build was made with

Phaze-III sensor

Build Source IDE FP_IN_IROM Duration UI Sample rate (K/s) Wifi Errors SDS011 Errors Remarks Git Hash
NRZ-2020-129 online pio no various normal 30 0 a few during server problem periods   8a936e1
NRZ-2020-130-B9/DE OTA pio no 1 hour unusable 15 5 0   f62c962
NRZ-2020-130-B10/DE OTA pio no 6 hours bad 19 10 0   04d54e3
builds/NRZ-2020-130-B10-fp-in-rom/latest_de.bin dmllr.de pio yes 10 hours good 35 0 0    
NRZ-2020-130-B11/DE (14 Oct 2020) OTA pio yes 12 hours good 35 0 0 14 Oct build 418c866
NRZ-2020-131/DE OTA pio yes 12 hours normal 25 0 0 One additional no_inline b78aa0a
builds-NRZ-2020-132-B1-sds011rework/latest_nl.bin dmllr.de pio ??? 14 hours good 35 1 a few during server problem periods    
NRZ-2020-132-B2/NL OTA pio yes 10 hours normal 25 0 a few during server problem periods   5adb530
FP_IN_IROM test                     
NRZ-2020-132-B2-no-fp in irom-8c9e540/NL local pio no 22 hours good 35 0 2 during server problem periods   Phaze-III/sensors-software@8c9e540
NRZ-2020-131-no-fp_in_irom-8c28e90/NL local pio no 6+ hours  normal  35 0 0 Phaze-III/sensors-software@8c28e90
sds011rework backport                     
NRZ-2020-130-B9-sds011rework_nl.bin local pio no 1 hour unusable 12.5 ? ?   Phaze-III/sensors-software@dc8f6a3
NRZ-2020-130-B10-sds011rework_nl.bin local pio no 5 hours normal 25 0     Phaze-III/sensors-software@dfc2310
Arduino IDE                     
NRZ-2020-130-B9_de.bin local Arduino - 18 hours normal 18 0     f62c962
NRZ-2020-131_nl.bin local Arduino - 8 hours good 35 0     b78aa0a
NRZ-2020-132-B2-5adb530.bin local Arduino - 3 hours good 35 0     5adb530

Fritz!Box Mesh sensor

Build Source IDE FP_IN_IROM Duration UI Sample rate Wifi Errors SDS011 Errors Remarks Git Hash
NRZ-2020-131/NL OTA pio yes 42 minutes unusable ? 16 8   b78aa0a
builds-NRZ-2020-132-B1-sds011rework/latest_nl.bin dmllr.de pio ??? 3 days good 35 4 6    
NRZ-2020-132-B2/NL OTA pio yes ~9 hours unusable 25 215 0 Including power cycle 5adb530
NRZ-2020-132-B2-no-fp in irom-8c9e540/NL local pio no 4 hours good 35 0 0   Phaze-III/sensors-software@8c9e540
NRZ-2020-131/NL local Arduino - 11 hours good ? 1 0   b78aa0a
NRZ-2020-131-no-fp_in_irom-8c28e90/NL local pio no 5+ hours  timeouts 35  1 Phaze-III/sensors-software@8c28e90
dirkmueller commented 4 years ago

@Phaze-III thanks for the exhaustive testing. I can spend some time on making sure that the builds become more reproducible, however that will take some time to upstream.

There is a different measurement that might be more telling telling : the max_micros one.

Now, with the B2 build, is there anything in the sensors list to disable that then avoid the issue? Like for example disabling sds011 in config?

Also could you please give the 4 test build it firmwares from me a try in your mesh environment? I think that would be also very helpful.

dirkmueller commented 4 years ago

@holgerbohni just to be sure I understand this correctly, the sds011rework firmware works after you reinstall it but the -B2 build does not?

wertziop commented 4 years ago

@dirkmueller yes, that’s correct. B2 produces a lot of sds011-errors, fewer wifi-errors and after approx 20 hours it reboots with reason „hardware watchdog“. I‘ve never seen this before. „sds011rework“ had no errors at all.

Phaze-III commented 4 years ago

@dirkmueller : I've updated the results for the problematic Fritz!Box Mesh sensor in the table below. The 4 firmwares were already tested but not included in the table (see https://github.com/opendata-stuttgart/sensors-software/issues/814#issuecomment-714561555), fixed that.

I also asked the owner to do a test of the OTA version of B2 with all sensors disabled, debug level set and saved to 0 and only one API (Madavi.de) enabled. That didn't improve things, still a non-responsive sensor with only Wifi-errors. At the end of the test period the owner could only get to the UI after a lot of F5 tries.

There's also another additional test the owner did overnight: flashing a saved copy of the Oct 14 2020 online build of NRZ-2020-130-B11 . That one ran for at least 12 hours without errors!

My conclusion at the moment, without trying to pinpoint a root cause, is that there is some weird interaction between even the slightest code change (except for perhaps string constants like version and language strings) and the build process. So the minimal code difference (just one line and whitespace) between a stable B11 (Oct 14 2020 build) and NRZ-2020-131 resulted in a binary that doesn't work on 'problematic' sensors.

My suggestion would then be that a point patch on NRZ-2020-131 to get the same code as 130-B11 would give you working firmware in the stable channel for the problematic sensors. The owner of the problematic Fritz!box Mesh sensor is now running a build with the patch below, until now very stable.

That doesn't help in finding the cause but might help in getting 'problematic' sensors back online.

diff --git a/airrohr-firmware/airrohr-firmware.ino b/airrohr-firmware/airrohr-firmware.ino
index f3061ff..07f9fa5 100644
--- a/airrohr-firmware/airrohr-firmware.ino
+++ b/airrohr-firmware/airrohr-firmware.ino
@@ -60,7 +60,7 @@
 #include <pgmspace.h>

 // increment on change
-#define SOFTWARE_VERSION_STR "NRZ-2020-131"
+#define SOFTWARE_VERSION_STR "NRZ-2020-131-P1"
 String SOFTWARE_VERSION(SOFTWARE_VERSION_STR);

 /*****************************************************************
@@ -3097,7 +3097,7 @@ static void fetchSensorNPM(String& s) {
 /*****************************************************************
  * read PPD42NS sensor values                                    *
  *****************************************************************/
-static __noinline void fetchSensorPPD(String& s) {
+static void fetchSensorPPD(String& s) {
        debug_outln_verbose(FPSTR(DBG_TXT_START_READING), FPSTR(SENSORS_PPD42NS));

        if (msSince(starttime) <= SAMPLETIME_MS) {
@@ -3250,7 +3250,6 @@ static void fetchSensorDNMS(String& s) {
        debug_outln_info(FPSTR(DBG_TXT_SEP));
        debug_outln_verbose(FPSTR(DBG_TXT_END_READING), FPSTR(SENSORS_DNMS));
 }
-
 /*****************************************************************
  * read GPS sensor values                                        *
  *****************************************************************/

Fritz!Box Mesh sensor results

firmware.sensor.community builds Source IDE FP_IN_IROM Duration UI Sample rate Wifi Errors SDS011 Errors Remarks Git Hash
NRZ-2020-130-B11/DE (14 Oct 2020) online pio yes 12 hours good 35 0 0   418c866
NRZ-2020-131/NL OTA pio yes 42 minutes unusable ? 16 8   b78aa0a
NRZ-2020-132-B2/NL OTA pio yes ~9 hours unusable 25 215 0 Including power cycle 5adb530
NRZ-2020-132-B2/NL (no sensors, 1 API, lvl=0) OTA pio yes 2 hours unusable 25 49/-75/8 - Including power cycle 5adb530
dmllr builds                    
builds-NRZ-2020-132-B1-wifirevert/latest_nl.bin dmllr.de pio ??? 35 minutes unusable ? 13 5    
builds-NRZ-2020-132-B1-multiwifi/latest_nl.bin dmllr.de pio ??? 3 hours unusable ? - -    
builds-NRZ-2020-132-B1-new-SDK/latest_nl.bin dmllr.de pio ??? 1 hour unusable ? - -    
builds-NRZ-2020-132-B1-oldarduinolatest_nl.bin dmllr.de pio ??? 1 hour unusable ? - -    
builds-NRZ-2020-132-B1-sds011rework/latest_nl.bin dmllr.de pio ??? 3 days normal/good 35 4 6    
local builds                    
NRZ-2020-131/NL local Arduino - 11 hours good ? 1 0   b78aa0a
NRZ-2020-131-no-fp_in_irom-8c28e90/NL local pio no 5+ hours timeouts 35 0 1   Phaze-III/sensors-software@8c28e90
NRZ-2020-132-B2-no-fp in irom-8c9e540/NL local pio no 4 hours good 35 0 0   Phaze-III/sensors-software@8c9e540
ricki-z commented 4 years ago

@Phaze-III Can you check the firmware versions installed on your Fritz!Box mesh and which WPA encryption is active? There were updates for most AVM devices in October which activated WPA3 support.

Phaze-III commented 4 years ago

@ricki-z All tests were done with: Fritz!Box 7583 running FritzOS 7.15 Fritz!WLAN 1750E repeater running FritzOS 7.20

On the Fritz!box only WPA2(CCMP) is enabled (no WPA, no WPA3 available in the settings)

As of 17:45 today the Fritz!Box 7583 went from 7.15 to 7.21, WPA3 available in the settings but disabled.

ricki-z commented 4 years ago

Is WPA3 also disabled on the Repeater? And to which AP the sensors should connect to?

Phaze-III commented 4 years ago

In the mesh the repeater is just a clone of the 'mesh master' and all settings from the master (SSID, channel, encryption etc) are identical and not configurable on the repeater. The sensor is located in a shed outside approx. 8 meter from the mesh master AP. The repeater is located on the attic with a few concrete walls and floors between repeater and sensor so practically out of range. The sensor is therefore always connecting to the mesh master. Signal strength is stable between -75 and -70dBm/50% and 60% usually.

Phaze-III commented 4 years ago

The owner of the 'problematic' Fritz!box Mesh sensor has been running the 'point patched' version of NRZ-2020-131 now for more than 3 days without any problems. Only 4 wifi-errors in three days, two of them while the Fritz!Box AP was being upgraded. WebUI always very responsive. Screenshot below.

Note that that build was made with a clean checkout of the master branch with only the point patch applied and using an unmodified platformio 5.0.1 build environment (same results with 5.0.2 BTW).

Using that environment I get exactly the same binaries as those on firmware.sensor.community. Size and MD5 checksum are identical when I set the system date to the date of the binary on firmware.sensor.community before building.

Screenshot 2020-10-31 at 15 19 18

Phaze-III commented 4 years ago

@dirkmueller

There is a different measurement that might be more telling telling : the max_micros one.

I see the occasional large spikes there on my sensor but I can't yet correlate them to specific problems.

Would it be possible to put a graph of max_micros on the Madavi/Sensor.community api-rrd grafana dashboard?

I can check the max_micros on my own sensor with in my local influxdb but the owner of the Fritz!box-sensor has no local API, only the Madavi/Sensor.Community ones.

ricki-z commented 4 years ago

The max_micro/min_micro values are now shown on the page with the wifi signal quality at api-rrd.madavi.de.

Phaze-III commented 4 years ago

Update: the OTA version of NRZ-2020-132-B3 has been running very stable for a few days now on my sensor with a sample rate of ~35K, no wifi-errors and no max_micro spikes.

I've also asked the owner of the 'problematic' Fritz!box Mesh sensor to do the OTA upgrade to NRZ-2020-132-B3 and he also reports a stable sensor, no gaps in the measurements, responsive UI.

So this particular codebase and build appears to produce a stable firmware-binary that might give good results in other difficult environments.

On a side note, it looks like the sensor works when the Fritz!box Mesh has WPA3 enabled. We did a short test with a stable locally patched version of NRZ-2020-132 and WPA2+WPA3 enabled and the sensor still worked, also after a reset (connecting with WPA2). So given a stable version of the firmware enabling both WPA2 and WPA3 on the network should work.

Phaze-III commented 4 years ago

Status after 2 days on the Fritz!box Mesh sensor is still looking good. A few Wifi errors which is to be expected given the rather low signal quality but overall very stable. Screenshot 2020-11-23 at 16 34 55

H-e-ro commented 4 years ago

Same issue here. Just tried 25 times to set a new device up. Device not connect to router. After reboot device needs to be configured again and again and again... really frustrating. Is my first sensor for air. Is there any chance to get an older firmware? And is it possible to set up a static ip?