letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.3k stars 2.22k forks source link

Interrupts #2463

Open s0170071 opened 5 years ago

s0170071 commented 5 years ago

I came across the recent pull request #2451 for the pulse counter which made me have a closer look at ISR routines.

This gives an overview on how to use ISRs properly. Unless there are ESP specific differences,

The pulse.ino uses non volatile global variables and makes a lot of function calls, e.g.: ISR->Plugin_003_pulsecheck(0)->timePassedSince()->timeDiff() This may lead to erratic undefined behaviour, such as the - maybe not so network related - crashes that we experience.

The ESPEasy code should be scanned for any further vilolation of ISR guidelines.

Domosapiens commented 5 years ago

such as the - maybe not so network related - crashes that we experience.

I can confirm https://github.com/letscontrolit/ESPEasy/issues/1774#issuecomment-482884095

For the watch dog reboot's, I see (also?) a relation with external interrupts: A unit with a switched-Off 6Hz driven pulse counter and 2#DS18B20's runs better than a unit with a switched-On 6Hz driven pulse counter and 2#DS18B20's.

Is the interrupt handling part of the problem?

TD-er commented 5 years ago

That's some very good and useful info here.

giig1967g commented 5 years ago

Part of this is a known issue: see #1874

s0170071 commented 5 years ago

I have the pulse counter device and a switch input device connected to the same input (water meter). Both values are sent to OpenHab via mqtt. The pulse counter counts approx. 2% more pulses than the switch input. (debounce set to 1200) The number of switch events seems to be correct.

Note: my device does not use the latest PR #2451 yet. @Domosapiens: Is your pulse counter counting correctly ?

Domosapiens commented 5 years ago

@s0170071 , for test I use a simple but effective signal generator: https://nl.aliexpress.com/item/Signal-Generator-PWM-Pulse-Frequency-Duty-Cycle-Adjustable-Module-LCD-Display-1Hz-150Khz-3-3V-30V/32835541668.html?spm=a2g0z.10010108.1000016.1.3ffe7835OVUxGK&isOrigTitle=true

gives a clear signal, no debounce needed. So debounce=0 Tried from 6hz - 60 Hz Dev 13 version has already problems with 6 Hz. Without the pulse counter Dev 13 runs problem free (49 days)

mega-20190406 can handle 60Hz with around 3 reboots per day. This is without reporting the counter to Domoticz, so no extra WiFi load! Lower frequency ... less reboots. With reporting via WiFi every minute ... more reboots So there seems some coincidence between interrupts and WiFi

Statistics of P_1_Switch input - Switch show an average of 0.080 ms with a number of times a max of around 220 ms (=3000 times larger!)

B.t.w. 60Hz needed for this type of flow sensor https://nl.aliexpress.com/item/3-4-water-Hall-magnetic-Brass-flow-sensor-flow-rate-measurement-meter-tool-part/32788046376.html?spm=a2g0z.10010108.1000016.1.3d54795f2i0NL2&isOrigTitle=true

TD-er commented 5 years ago

About this part:

don't use delay() or mircos() in your ISR

I get it why you don't want to use delay, but why not micros ? It can be used to get some timestamp stored in a global variable to be processed later on. Or should we store some very low level value like the clock cycle count and then later convert that to some time code?

s0170071 commented 5 years ago

@TD-er according to to the info here micros() works initially. I would assume that it also relies on interrupts and you cannot use it in your own interrupt routine as a timeout indicator. A interrupt routine cannot be interrupted by another. It will be executed if the initial interrupt routine exits.

I think calling millis() once is okay, but I would not do nested calls from within an ISR. And only access volatile vars. Otherwise compiler optimization works against you ;-)

@Domosapiens The wifi does use interrupts too, so I see the connection here. I just cannot point at the line in the code yet. Can you apply your 60Hz signal to eight pins and configure four pulse couters ? I would love to see how long the unit lasts. If it fails quickly, that would finally be an appropriate stability test.

s0170071 commented 5 years ago

Thinking of it... can't the ESPEasy module 60Hzify itself to death ?

TD-er commented 5 years ago

Well not "to death", but for sure it can lead to lost wifi connections or missing packets. Maybe for continuous "high" frequency signals, an external counter would be the best option I guess.

s0170071 commented 5 years ago

Well, outputting a pwm signal and placing a counter(s) on it at the same time requires no hardware at all. Not even a wire. Just saying...

TD-er commented 5 years ago

Is the PWM done in software (using ticker for example) or is it a hardware feature?

Domosapiens commented 5 years ago

This is the 60Hz unit, a LOLIN D1 mini V3.1.0: image I see no pattern, looks like coincidence.

LOLIN D1 mini V3.1.0 looks (?) to run more stable then Wemos D1 mini V?

TD-er commented 5 years ago

The chart does show the reboots?

s0170071 commented 5 years ago

Its software PWM https://github.com/esp8266/Arduino/issues/216 Does not make a difference for our experiment.

TD-er commented 5 years ago

@Domosapiens

LOLIN D1 mini V3.1.0 looks (?) to run more stable then Wemos D1 mini V?

You may want to add a little capacitor to the GND/3V3 pins. A lot of the recently sold Wemos boards only have a 150 mA voltage regulator

@s0170071 Well, it does make a difference. If it is software generated PWM, then the frequency will shift, but it will not block.

Domosapiens commented 5 years ago

The chart does show the reboots? Yep, 30 sec after reboot an On/Off is send

It's not a power supply thing, not a regulator thing, not a capacitor thing. Capacitors all around and everywhere.

I measured with scope with statistical function. LOLIN D1 mini V3.1.0: Absolute minimum peak was 3.24V, average minimum 3.26V, noise AC peak-peak 108mV (so above the 3.24V) Wemos D1 mini V?: Absolute minimum peak was 3.22V, average minimum 3.28V, noise AC peak-peak 124mV

And ... no single trigger on values below the mentioned minimum peak

s0170071 commented 5 years ago

Just tried a newly flashed unit with multiple counters- all have D4 (LED) as input;

grafik

I started with one, but when I add more, only the last one seems to increment. grafik

TD-er commented 5 years ago

Hmm if all listen to the same pin, I am not sure if that was ever intended to work. Also the way the interrupts are processed in the pulse count plugin leaves room for improvement.

s0170071 commented 5 years ago

grafik Thats the count of the counter with D4(GPIO2) as input and the LED enabled. No other Hardware. Thats a pretty nice test setup ;-) Data was sent to mqtt every second.

Domosapiens commented 5 years ago

@s0170071

Thinking of it... can't the ESPEasy module 60Hzify itself to death ?

@TD-er

Maybe for continuous "high" frequency signals, an external counter would be the best option I guess.

When the counter reports every minute to Domoticz, the load @60Hz is less then 25%, So load does not seem the problem.

But this is strange! image and (effect or cause?): image Something is locking the machine

TD-er commented 5 years ago

Hmm, the TEN_PER_SECOND call of the switch input should not take over a second to process. You may want to set the timeout in the controller to a lower value. It is probably at its default of 1000 msec.

Domosapiens commented 5 years ago

So difficult to understand, all those buttons and knobs ... Indeed the Client Timeout was 1000 ms, ... does that mean that sometimes Domoticz does not react within 1000ms ???

Check Reply is on Ignore Acknowledgement .... how does that match with the Client Timeout ? Isn't that just send and forget (like UDP) ?

Ping from Domoticz PC to Node is max 9ms. Ping from Node to Domoticz PC .....how to do that? What is a good value for Client Timeout ? Set to 100ms for now.

s0170071 commented 5 years ago

Just witnessed a pulse related crash after changing these to volatile:

volatile unsigned long Plugin_003_pulseCounter[TASKS_MAX]; volatile unsigned long Plugin_003_pulseTotalCounter[TASKS_MAX]; volatile unsigned long Plugin_003_pulseTime[TASKS_MAX]; volatile unsigned long Plugin_003_pulseTimePrevious[TASKS_MAX];

grafik

s0170071 commented 5 years ago

looking at https://github.com/esp8266/Arduino/issues/615 and the crash dump above, it seems that the ESP does support nested interrupts. Information on the internet is not consistent here... Right before the crash there were obviously three interrupt calls at the same time. Plugin_003_pulsecheck(unsigned char) even twice and ieee80211_parse_beacon. I will now check if temporarily disabling interrupts within the ISR using noInterrupts(); (and re-enabling them) does any good. Will report back.

s0170071 commented 5 years ago

@TD-er : check this out: https://github.com/esp8266/Arduino/issues/3579#issuecomment-328252421 comment from igrr:

All interrupt handlers which can be called while flash access is in progress must be in IRAM (and all code paths called from these handlers).

The pulsecounter ISR makes a lot of calls and I doubt that all of that code is in IRAM. Btw, do you know if these two lines are equal ?

void Plugin_003_pulse_interrupt1() ICACHE_RAM_ATTR;
void ICACHE_RAM_ATTR Plugin_003_pulse_interrupt1() ;
TD-er commented 5 years ago

I honestly don't know if those lines are giving equal result. Intuitively I would say the ICACHE_RAM_ATTR should be at the end of the line, since it is a compiler hint about the function and not the result. (like override and const for example)

And these ISR functions should only do a very simple thing, like increasing a counter or storing a value and nothing more. Disabling interrupts does feel a bit like not "doing only simple stuff". I would think disabling interrupts is only needed for really time critical stuff like bit-banging pins at a time critical speed.

s0170071 commented 5 years ago

Well, I will try the following then:

  1. dis/enable interrupts to avoid them being nested (test currently running)
  2. make sure only IRAM code is executed (modify ISR->Plugin_003_pulsecheck(0)->timePassedSince()->timeDiff() call chain)
  3. try void Plugin_003_pulse_interrupt1() ICACHE_RAM_ATTR; versus void ICACHE_RAM_ATTR Plugin_003_pulse_interrupt1() ;
  4. range check of index variable in
    void Plugin_003_pulsecheck(byte Index)
    {
    noInterrupts();
    const unsigned long PulseTime=timePassedSince(Plugin_003_pulseTimePrevious[Index]);
    if(PulseTime > (unsigned long)Settings.TaskDevicePluginConfig[Index][0]) // check with debounce time for this task
    {
      Plugin_003_pulseCounter[Index]++;
      Plugin_003_pulseTotalCounter[Index]++;
      Plugin_003_pulseTime[Index] = PulseTime;
      Plugin_003_pulseTimePrevious[Index]=millis();
    }
    interrupts();   
    }

    should be ok, but sometimes I recognize race conditions only after I fixed them :-) Since it takes up to 10 hours until a crash, this might take a while.

s0170071 commented 5 years ago

https://github.com/esp8266/Arduino/pull/3950/files/ee827566ddacecdf8d684fed458b2f9b29fbc064#diff-e0672c44993f08d8004aae23dff5dc44

s0170071 commented 5 years ago

Issue seems to be fixed. No reboots for 24 hours. I haven't used my gitkraken for half a year and can't get it up and running now so I post the code here: Maybe @TD-er can make a PR ?


#ifdef USES_P003
//#######################################################################################################
//#################################### Plugin 003: Pulse  ###############################################
//#######################################################################################################

#define PLUGIN_003
#define PLUGIN_ID_003         3
#define PLUGIN_NAME_003       "Generic - Pulse counter"
#define PLUGIN_VALUENAME1_003 "Count"
#define PLUGIN_VALUENAME2_003 "Total"
#define PLUGIN_VALUENAME3_003 "Time"

void Plugin_003_pulse_interrupt1() ICACHE_RAM_ATTR;
void Plugin_003_pulse_interrupt2() ICACHE_RAM_ATTR;
void Plugin_003_pulse_interrupt3() ICACHE_RAM_ATTR;
void Plugin_003_pulse_interrupt4() ICACHE_RAM_ATTR;
void Plugin_003_pulsecheck(byte Index) ICACHE_RAM_ATTR;

//this takes 20 bytes of IRAM per handler
// void Plugin_003_pulse_interrupt5() ICACHE_RAM_ATTR;
// void Plugin_003_pulse_interrupt6() ICACHE_RAM_ATTR;
// void Plugin_003_pulse_interrupt7() ICACHE_RAM_ATTR;
// void Plugin_003_pulse_interrupt8() ICACHE_RAM_ATTR;

volatile unsigned long Plugin_003_pulseCounter[TASKS_MAX];
volatile unsigned long Plugin_003_pulseTotalCounter[TASKS_MAX];
volatile unsigned long Plugin_003_pulseTime[TASKS_MAX];
volatile unsigned long Plugin_003_pulseTimePrevious[TASKS_MAX];

boolean Plugin_003(byte function, struct EventStruct *event, String& string)
{
  boolean success = false;

  switch (function)
  {

    case PLUGIN_DEVICE_ADD:
      {
        Device[++deviceCount].Number = PLUGIN_ID_003;
        Device[deviceCount].Type = DEVICE_TYPE_SINGLE;
        Device[deviceCount].VType = SENSOR_TYPE_SINGLE;
        Device[deviceCount].Ports = 0;
        Device[deviceCount].PullUpOption = false;
        Device[deviceCount].InverseLogicOption = false;
        Device[deviceCount].FormulaOption = true;
        Device[deviceCount].ValueCount = 3;
        Device[deviceCount].SendDataOption = true;
        Device[deviceCount].TimerOption = true;
        Device[deviceCount].GlobalSyncOption = true;
        break;
      }

    case PLUGIN_GET_DEVICENAME:
      {
        string = F(PLUGIN_NAME_003);
        break;
      }

    case PLUGIN_GET_DEVICEVALUENAMES:
      {
        strcpy_P(ExtraTaskSettings.TaskDeviceValueNames[0], PSTR(PLUGIN_VALUENAME1_003));
        strcpy_P(ExtraTaskSettings.TaskDeviceValueNames[1], PSTR(PLUGIN_VALUENAME2_003));
        strcpy_P(ExtraTaskSettings.TaskDeviceValueNames[2], PSTR(PLUGIN_VALUENAME3_003));
        break;
      }

    case PLUGIN_GET_DEVICEGPIONAMES:
      {
        event->String1 = formatGpioName_input(F("Pulse"));
        break;
      }

    case PLUGIN_WEBFORM_LOAD:
      {
        addFormNumericBox(F("Debounce Time (mSec)"), F("p003")
                , Settings.TaskDevicePluginConfig[event->TaskIndex][0]);

        byte choice = Settings.TaskDevicePluginConfig[event->TaskIndex][1];
        byte choice2 = Settings.TaskDevicePluginConfig[event->TaskIndex][2];
        String options[4] = { F("Delta"), F("Delta/Total/Time"), F("Total"), F("Delta/Total") };
        addFormSelector(F("Counter Type"), F("p003_countertype"), 4, options, NULL, choice );

        if (choice !=0)
          addHtml(F("<span style=\"color:red\">Total count is not persistent!</span>"));

        String modeRaise[4];
        modeRaise[0] = F("LOW");
        modeRaise[1] = F("CHANGE");
        modeRaise[2] = F("RISING");
        modeRaise[3] = F("FALLING");
        int modeValues[4];
        modeValues[0] = LOW;
        modeValues[1] = CHANGE;
        modeValues[2] = RISING;
        modeValues[3] = FALLING;

        addFormSelector(F("Mode Type"), F("p003_raisetype"), 4, modeRaise, modeValues, choice2 );

        success = true;
        break;
      }

    case PLUGIN_WEBFORM_SAVE:
      {
        Settings.TaskDevicePluginConfig[event->TaskIndex][0] = getFormItemInt(F("p003"));
        Settings.TaskDevicePluginConfig[event->TaskIndex][1] = getFormItemInt(F("p003_countertype"));
        Settings.TaskDevicePluginConfig[event->TaskIndex][2] = getFormItemInt(F("p003_raisetype"));
        success = true;
        break;
      }

    case PLUGIN_WEBFORM_SHOW_VALUES:
      {
        string += F("<div class=\"div_l\">");
        string += ExtraTaskSettings.TaskDeviceValueNames[0];
        string += F(":</div><div class=\"div_r\">");
        string += Plugin_003_pulseCounter[event->TaskIndex];
        string += F("</div><div class=\"div_br\"></div><div class=\"div_l\">");
        string += ExtraTaskSettings.TaskDeviceValueNames[1];
        string += F(":</div><div class=\"div_r\">");
        string += Plugin_003_pulseTotalCounter[event->TaskIndex];
        string += F("</div><div class=\"div_br\"></div><div class=\"div_l\">");
        string += ExtraTaskSettings.TaskDeviceValueNames[2];
        string += F(":</div><div class=\"div_r\">");
        string += Plugin_003_pulseTime[event->TaskIndex];
        string += F("</div>");
        success = true;
        break;
      }

    case PLUGIN_INIT:
      {
        String log = F("INIT : Pulse ");
        log += Settings.TaskDevicePin1[event->TaskIndex];
        addLog(LOG_LEVEL_INFO,log);
        pinMode(Settings.TaskDevicePin1[event->TaskIndex], INPUT_PULLUP);
        success = Plugin_003_pulseinit(Settings.TaskDevicePin1[event->TaskIndex], event->TaskIndex,Settings.TaskDevicePluginConfig[event->TaskIndex][2]);
        break;
      }

    case PLUGIN_READ:
      {
        UserVar[event->BaseVarIndex] = Plugin_003_pulseCounter[event->TaskIndex];
        UserVar[event->BaseVarIndex+1] = Plugin_003_pulseTotalCounter[event->TaskIndex];
        UserVar[event->BaseVarIndex+2] = Plugin_003_pulseTime[event->TaskIndex];

        switch (Settings.TaskDevicePluginConfig[event->TaskIndex][1])
        {
          case 0:
          {
            event->sensorType = SENSOR_TYPE_SINGLE;
            UserVar[event->BaseVarIndex] = Plugin_003_pulseCounter[event->TaskIndex];
            break;
          }
          case 1:
          {
            event->sensorType = SENSOR_TYPE_TRIPLE;
            UserVar[event->BaseVarIndex] = Plugin_003_pulseCounter[event->TaskIndex];
            UserVar[event->BaseVarIndex+1] = Plugin_003_pulseTotalCounter[event->TaskIndex];
            UserVar[event->BaseVarIndex+2] = Plugin_003_pulseTime[event->TaskIndex];
            break;
          }
          case 2:
          {
            event->sensorType = SENSOR_TYPE_SINGLE;
            UserVar[event->BaseVarIndex] = Plugin_003_pulseTotalCounter[event->TaskIndex];
            break;
          }
          case 3:
          {
            event->sensorType = SENSOR_TYPE_DUAL;
            UserVar[event->BaseVarIndex] = Plugin_003_pulseCounter[event->TaskIndex];
            UserVar[event->BaseVarIndex+1] = Plugin_003_pulseTotalCounter[event->TaskIndex];
            break;
          }
        }
        Plugin_003_pulseCounter[event->TaskIndex] = 0;
        success = true;
        break;
      }
  }
  return success;
}

/*********************************************************************************************\
 * Check Pulse Counters (called from irq handler)
\*********************************************************************************************/
void Plugin_003_pulsecheck(byte Index)
{
  noInterrupts(); // s0170071: avoid nested interrups due to bouncing.

  //  s0170071: the following gives a glitch if millis() rolls over (every 50 days) and there is a bouncing to be avoided at the exact same time. Very rare.
  //  Alternatively there is timePassedSince(Plugin_003_pulseTimePrevious[Index]); but this is not in IRAM at this time, so do not use in a ISR!
  const unsigned long PulseTime=millis() - Plugin_003_pulseTimePrevious[Index]; 

  if(PulseTime > (unsigned long)Settings.TaskDevicePluginConfig[Index][0]) // check with debounce time for this task
    {
      Plugin_003_pulseCounter[Index]++;
      Plugin_003_pulseTotalCounter[Index]++;
      Plugin_003_pulseTime[Index] = PulseTime;
      Plugin_003_pulseTimePrevious[Index]=millis();
    }
  interrupts();   // enable interrupts again.
}

/*********************************************************************************************\
 * Pulse Counter IRQ handlers
\*********************************************************************************************/
void Plugin_003_pulse_interrupt1()
{
  Plugin_003_pulsecheck(0);
}
void Plugin_003_pulse_interrupt2()
{
  Plugin_003_pulsecheck(1);
}
void Plugin_003_pulse_interrupt3()
{
  Plugin_003_pulsecheck(2);
}
void Plugin_003_pulse_interrupt4()
{
  Plugin_003_pulsecheck(3);
}
void Plugin_003_pulse_interrupt5()
{
  Plugin_003_pulsecheck(4);
}
void Plugin_003_pulse_interrupt6()
{
  Plugin_003_pulsecheck(5);
}
void Plugin_003_pulse_interrupt7()
{
  Plugin_003_pulsecheck(6);
}
void Plugin_003_pulse_interrupt8()
{
  Plugin_003_pulsecheck(7);
}

/*********************************************************************************************\
 * Init Pulse Counters
\*********************************************************************************************/
bool Plugin_003_pulseinit(byte Par1, byte Index, byte Mode)
{

  switch (Index)
  {
    case 0:
      attachInterrupt(Par1, Plugin_003_pulse_interrupt1, Mode);
      break;
    case 1:
      attachInterrupt(Par1, Plugin_003_pulse_interrupt2, Mode);
      break;
    case 2:
      attachInterrupt(Par1, Plugin_003_pulse_interrupt3, Mode);
      break;
    case 3:
      attachInterrupt(Par1, Plugin_003_pulse_interrupt4, Mode);
      break;
    // case 4:
    //   attachInterrupt(Par1, Plugin_003_pulse_interrupt5, Mode);
    //   break;
    // case 5:
    //   attachInterrupt(Par1, Plugin_003_pulse_interrupt6, Mode);
    //   break;
    // case 6:
    //   attachInterrupt(Par1, Plugin_003_pulse_interrupt7, Mode);
    //   break;
    // case 7:
    //   attachInterrupt(Par1, Plugin_003_pulse_interrupt8, Mode);
    //   break;
    default:
      addLog(LOG_LEVEL_ERROR,F("PULSE: Error, only the first 4 tasks can be pulse counters."));
      return(false);
  }

  return(true);
}
#endif // USES_P003
s0170071 commented 5 years ago

grafik

uzi18 commented 5 years ago

nice, maybe we can mark also timePassedSince and maybe millis function, they are shown also in backtrace

TD-er commented 5 years ago

I made a PR for it, but it looks like you used an older version of the file? See the diff: https://github.com/letscontrolit/ESPEasy/pull/2469/files

It also seems to undo these changes made about a month ago: https://github.com/letscontrolit/ESPEasy/commit/07771fe7424fd4472cb86863fd69fda1a22d1160#diff-8846b6c63f20a53f51c243807095add8

s0170071 commented 5 years ago

You're right. I do not understand what the advantage of the recent code changes are. The debounce was perfectly fine (apart from the ISR fails).

The new version of the code now does check the slope direction manually, triggering on each slope. The old version configured the ISR to be triggered on the correct slope in the first place.

We should discuss this with @MrBenzim, but as I see it, these changes do make things just more complicated.

blb4github commented 5 years ago

Hi All, Thanks for this very good work! I just want to share my positive experience with this _P003_Pulse.ino. With recent (since months) versions I was experiencing very high number of reboots of 1 of my units on which I've configured 2 pulse counters to measure Energy generation of my solar panels. I did go back to DEV-13 version which was much more stable. Still experiencing 1 - 10 reboots per day.

Now I have compiled myself a version for my Wemos D1 Pro. I used the SCR from version Release mega-20190511 and replaced _P003_Pulse.ino with the code provided above by s0170071. I compiled 4M version ESP82xx Core 2.6.0-dev, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support.

I've installed this version last night and the unit didn't reboot since then, it's running for almost 21 hours now!

uzi18 commented 5 years ago

@TD-er what about timePassedSince -> ICACHE_RAM or maybe some optimisation needed there?

TD-er commented 5 years ago

The timediff function currently has some calls to keep track of its use and I think those can be removed. Or else the statistics variables must be put in iram too and that's too much. So that function can be a lot simpler.

blb4github commented 5 years ago

Update from my side after 4 days: since I loaded my self compiled firmware with the _P003_Pulse.ino from s0170071 above the module didn't reboot unplanned any more, very happy with this result! I'm rebooting the module once a day after midnight to reset the counters so I don't know how it will behave for period's > 24 hours.

I have loaded this firmware as well on other modules without pulse counters but others are not that stable.

TD-er commented 5 years ago

Well it isn't a fix for all issues, but good to know it does make it more stable for the pulse counter ones.

@uzi18 / @s0170071 Should I merge this one?

uzi18 commented 5 years ago

@TD-er sure, my idea was only to optimize rest of interupt handler, to be sure flash will not delay execution time, you can try inline some functions like timePassedSince etc. but this will. take precious RAM.

uzi18 commented 5 years ago

@TD-er but if you ask me about signal edges count, my opinion is to check all changes (rise+fall) and/or force users to add simple external hardware filter

s0170071 commented 5 years ago

Just merge it and put the global interrupt gpio handler on your normal Todo list.

s0170071 commented 5 years ago

@TD-er I keep this issue open as there are some other things to tackle:

@Domosapiens do you use any of these plugins ? Furthermore I noticed you use a RCWL-0516 radar sensor. Are you sure it is not interfering ? It runs at 3.2 GHz... Detaching it may be worth a try.

Edit: (applies to all plugins mentioned above)

A word on atomic instructions: Any variable- also volatiles- which may be changed by an interrupt should be surrounded by a semaphore to prohibit the ISR changing it during read. Alternatively this variable is only read while interrupts are disabled. Good practice is this: Let an ISR only change a dedicated variable, read it only in one place in the code and disable interrupts while reading it. When you're done reading and have re-enabled interrupts, you can copy this variable to whereever you please.

TD-er commented 5 years ago

P005_DHT.ino uses noInterrupts() which should be checked as this can affect the Wifi stability. There is an exit path from PLUGIN_READ which leaves interrupts disabled. Very likely source of trouble.

Hmm, that sounds familiar, like something I also changed not so long ago.

In the core libs, there is a ongoing fix for the calls to attachinterrupt calls, so I got the feeling they are also looking into that part of the code right now.

I totally agree with leaving this issue open.

About those radar units. It may not be a bad idea to somewhat shield those, as long as you don't render it useless (metal close to it may prevent the oscilator from starting and will have an significant effect on sensitivity) They run indeed on high frequencies, but what I've read about them, it is not really a single well defined frequency.

s0170071 commented 5 years ago

@TD-er please mind my edit in the post above above on atomic reads.

Domosapiens commented 5 years ago

@s0170071 @TD-er

do you use any of these plugins ?

Nope, only PulseCounter, InputSwitch, LCD2004, SystemInfo, DS18B20 and DummyDevice

The LCD1602 acts somewhat as shielding: image

The RCWL-0516 radar sensor is connected with enable to a GPIO. For test, I can switch, via a rule, it totally off. So I had to check the actual situation:

Conclusion:

s0170071 commented 5 years ago

@Domosapiens I used to have reboots with the display plugin, but that one doesnt use interrupts. I would deactivate one plugin (and rules) at a time and see if the reboot frequencys change.

TD-er commented 5 years ago

Could be the display plugin does take more time to update the display? Do you have timing statistics of that plugin?

Domosapiens commented 5 years ago

Sample of the bad one (27 reboots) image

Sample of the "good one" , with 3 reboots image

TD-er commented 5 years ago

200 msec on average is too much, if there's no call to yield() or delay() during that update, it is very likely to cause issues with the wifi connection.

blb4github commented 5 years ago

Hopeful update from my side: last night I loaded Release mega-20190523 on all 5 Wemos D1 mini (Pro) modules I've running and since then (10 hours ago now) non of them did reboot. I can't remember if I ever seen that before. Fingers crossed, it looks like this one is very stable!