Improve impulse detection accuracy

laberning / openrowingmonitor

A free and open source performance monitor for rowing machines

https://laberning.github.io/openrowingmonitor

GNU General Public License v3.0

98 stars 19 forks source link

Improve impulse detection accuracy #85

Open Abasz opened 1 year ago

Abasz commented 1 year ago

Out of the box, Raspbian and nodejs is configured to provide a decent experience in terms of accuracy for detecting GPIO changes. However, responding instantly to incoming measurements tends to be deviating. This is due to the fact that the Rpi does not implement a real hardware interrupt.

For this purpose it is important that all options are explored that can improve latency and accuracy of the rotation impulse detection available on the chosen platform.

Currently ORM uses the on/off library. This has been tested and works fairly ok, but still includes noticeable measurement errors. E.g. below is a 25min "Pyramid" workout, recorded with both a micro-controller (ESP32 with real hardware interrupt, also including a debounce filter, please see below more details on this) and the ORM on a Rpi 4 Model B, where orange color is the ORM.

Another sing that latency may not be sufficient is that the above workout was recorded trough a reed switch that should have a certain amount of debounce that is not captured by ORM even without a dedicated software debounce filter.

Measurement errors and bounce can be distinguished easily:

bounce tends to be biassed (i.e. always leads to too short times), while
material measurement errors are generally symmetric (lost time is made up on the next detection cycle).

Even though measurement errors can be handled via a good and robust filter algorithm, it is best approach to achieve the best and cleanest input data as possible on most machines.

Currently we are reviewing whether moving to a different GPIO library such as pigpio would improve measurement accuracy.

According to the library author the library should be able to "Handle up to 20000 interrupts per second". That more than sufficient number of interrupt even for air rowers.

Initial testing shows that in a few strokes recorded with the pigpio library bounce is present that could be a sign of improved timing accuracy: 2022-09-18_21-01-58_raw.csv

Note that once the Rpi is able to measure shorter impulses (i.e. potential bounces) a filter must be implemented. I recommend implementing a software doubunce filter on the GPIO level, i.e. separate from the rowing and flywheel engine. Actually pigpio has its own debounce settings that could be used, but if that is not sufficient a I have a working two stage debounce filter on air rower with reed switch and an MCU.

@JaapvanEkris Below is the code how I test the pigpio library

'use strict'
import process from 'process'
import { Gpio } from 'pigpio'
import os from 'os'
import config from '../tools/ConfigManager.js'
import log from 'loglevel'

log.setLevel(config.loglevel.default)
export function createGpioTimerService () {
    if (config.gpioPriority) {
      log.debug(`Setting priority for the Gpio-service to ${config.gpioPriority}`)
      try {
        os.setPriority(config.gpioPriority)
      } catch (err) {
        log.debug('need root permission to set priority of Gpio-Thread')
      }
    }

 // with pull-up we need to detect FALLING_EDGE as based on my experience this yields cleaner results - 
// I dont fully understand the technicalities but its down to how the LOW and HIGH state is detected, 
// namely the minimum voltage that is required to be read as LOW state
    const sensor = new Gpio(config.gpioPin, {mode: Gpio.INPUT, pullUpDown: Gpio.PUD_UP, edge: Gpio.FALLING_EDGE})

// I believe that process.hrTime([time]) is [legacy code](https://nodejs.org/api/process.html#processhrtimetime) and the following 
// is recommended to be used instead [process.hrtime.bigint()](https://nodejs.org/api/process.html#processhrtimebigint)
    let hrStartTime = process.hrtime()

    sensor.on("interrupt", (value) => {
      const hrDelta = process.hrtime(hrStartTime)
// here using the minimumTimeBetweenImpulses setting to discard any impulse that is too short to be a valid impulse
      if (hrDelta[0] + hrDelta[1] / 1e9 > config.rowerSettings.minimumTimeBetweenImpulses){
/*I would avoid rereading hrTime. I would rather read the hrTime at the beginning of the function, and save it to a variable and use that all the way.
(Rpi does not have a real interrupt so it could happen that while the code is being run in the "interrupt" it does something else and the second hrTime read gets executed with a short delay yielding potential inaccuracies.) 
*/
        hrStartTime = process.hrtime()
        const delta = hrDelta[0] + hrDelta[1] / 1e9
        process.send(delta)
      }
    })
  }
createGpioTimerService()

Here is the data from a few pulls: 2022-09-19_13-31-33_raw.csv

Its apparent that bounces on lower rotations (where the magnet needs more time to pass over the reed switch) can still make it trough this simple filter. To clean this up a simple second level filter can be implemented that observes the absolute difference of the current impulse's DT and the impulse's previous DT. I used the below code for the ESP32 in c++. However, I dont know if this works on rowers other than air rowers (I have never tested it on other rowers). It is based on an assumption that any delta time that is greater than the absolute difference of the current and the last delta times must be valid. I dont know if this assumption applies to any other types of rowers.

// currentCleanDeltaTime is the equivalent of "const delta"
     auto currentCleanDeltaTime = now - previousCleanRevTime;

// this can be simplifed into an abs diff function (its a c++ matter that made me implement it this way) 
    auto deltaTimeDiffPair = minmax<volatile unsigned long>(currentCleanDeltaTime, previousDeltaTime);
    auto deltaTimeDiff = deltaTimeDiffPair.second - deltaTimeDiffPair.first;

// we save the current value in state to use on the next impulse
    previousDeltaTime = currentCleanDeltaTime;

// we check if the deltaTimeDiff is greater than our current delta time and if yes it is a bounce and we discard the impulse
    if (deltaTimeDiff > currentCleanDeltaTime)
        return;

Abasz commented 1 year ago

One additional pro for the pigpio libarary:

On the interrupt event it provides the tick. According to the docs:

tick is the time stamp of the state change, an unsigned 32 bit integer tick is the number of microseconds since system boot and it should be accurate to a few microseconds. As tick is an unsigned 32 bit quantity it wraps around after 2^32 microseconds, which is approximately 1 hour 12 minutes.

The above should mean that it contains the exact timing info when the interrupt was triggered. Using this would have the advantage that it decouples the handling of the interrupt (in terms of timing) and the actual timing of the interrupt trigger (i.e. cases where the interrupt is triggered accurately but the execution of the process.hrTime code is delayed resulting in short delays we would still be able to retain the original timing info).

There is a potential con that since this is a 32 bit it overflows and such overflow needs to be handled. This should be doable as we need the delta and not the absolute amounts so the overflow should not be an issue (but specific code is needed to handle this case).

JaapvanEkris commented 1 year ago

@JaapvanEkris Below is the code how I test the pigpio library

Thanks! I'll put it on my Raspberry tomorrow and see what happens. Looks extremely promissing.

According to the library author the library should be able to "Handle up to 20000 interrupts per second". That more than sufficient number of interrupt even for air rowers.

I found this sentence:

Up to 3.5 million digital reads per second

So, a lot of reads, which sounds promissing. I also found this setting (see https://github.com/fivdi/pigpio/blob/master/doc/configuration.md#configureclockmicroseconds-peripheral), which:

configureClock(microseconds, peripheral)
* microseconds - an unsigned integer specifying the sample rate in microseconds (1, 2, 4, 5, 8, or 10)
* peripheral - an unsigned integer specifying the peripheral for timing (CLOCK_PWM or CLOCK_PCM)

If set to 1us, it will load the CPU (single CPU?) to 25%, but it will sample 1000000 per second. One of the additional avenue's I'm looking at is CPU isolation: appoint a dedicated CPU to the GPIO-thread and another CPU for the processing thread (see https://github.com/JaapvanEkris/openrowingmonitor/blob/Backend_Redesign/docs/Improving_Raspberry_Performance.md). This might provide the GPIO thread with the CPU resources this needs without switching and scheduling needs. These might prove to be interlocking improvements.

On the interrupt event it provides the tick. According to the docs: tick is the time stamp of the state change, an unsigned 32 bit integer tick is the number of microseconds since system boot and it should be accurate to a few microseconds. As tick is an unsigned 32 bit quantity it wraps around after 2^32 microseconds, which is approximately 1 hour 12 minutes.

That is extremely cool as it removes additional stuff that could add latency to the signal.

There is a potential con that since this is a 32 bit it overflows and such overflow needs to be handled. This should be doable as we need the delta and not the absolute amounts so the overflow should not be an issue (but specific code is needed to handle this case).

In the API documentation (see https://github.com/fivdi/pigpio/blob/master/doc/gpio.md), there is a standard approach that fixes it:

(endTick >> 0) - (startTick >> 0)

In other documentation (https://github.com/fivdi/pigpio#determine-the-width-of-a-pulse-with-alerts) I found this example that determines the time between impulses using "alerts" (which seem inferior interrupts), showing how to read the tick:

// Assumption: the LED is off when the program is started

const Gpio = require('pigpio').Gpio;

const led = new Gpio(17, {
  mode: Gpio.OUTPUT,
  alert: true
});

const watchLed = () => {
  let startTick;

  // Use alerts to determine how long the LED was turned on
  led.on('alert', (level, tick) => {
    if (level == 1) {
      startTick = tick;
    } else {
      const endTick = tick;
      const diff = (endTick >> 0) - (startTick >> 0); // Unsigned 32 bit arithmetic
      console.log(diff);
    }
  });
};

watchLed();

// Turn the LED on for 15 microseconds once per second
setInterval(() => {
  led.trigger(15, 1);
}, 1000);

Abasz commented 1 year ago

Thanks! I'll put it on my Raspberry tomorrow and see what happens. Looks extremely promissing.

Hold your horses just for a little longer :), I will post the code in the evening that can be a starting point that could save you some time.

In the API documentation (see https://github.com/fivdi/pigpio/blob/master/doc/gpio.md), there is a standard approach that fixes it:
(endTick >> 0) - (startTick >> 0)

And they have a function for this: https://github.com/fivdi/pigpio/blob/master/doc/global.md#tickdiffstarttick-endtick

But in my view this is a simplification and does not work in certain edge cases (that could be an indication that other issues may occur). Consider the following:

endTick = 100 // This is low because the time has already overflown
startTick = 100000000 // This the last signal time we had and the delta should be rather high as a long time has passed

(100 >> 0) - (100000000 >> 0) // this equals to -99999900 i.e. this does not work

In general I agree that this should not happen in practice as impulses should follow each other with a shorter period of time. I am merely saying that this does not seem to be an all in one solution.

Any way since an unsigned int should give as 1+ hour of time this is something that can be dealt with once validation of the new library is done and benefits are confirmed/verified.

Abasz commented 1 year ago

Test code for new Gpio library.

ToDo:

1) Test if the two stage debounce filter works on non-air rowers 2) Test what uint arithmetic can be implemented (currently the code assumes that the app does not run longer than 1 hour i.e. does not care about uint overflow)

'use strict'
/*
  Open Rowing Monitor, https://github.com/laberning/openrowingmonitor

  Measures the time between impulses on the GPIO pin. Started in a
  separate thread, since we want the measured time to be as close as
  possible to real time.
*/
import process from 'process'
import { Gpio } from 'pigpio'
import os from 'os'
import config from '../tools/ConfigManager.js'
import log from 'loglevel'

log.setLevel(config.loglevel.default)
export function createGpioTimerService() {
  if (config.gpioPriority) {
    // setting top (near-real-time) priority for the Gpio process, as we don't want to miss anything
    log.debug(`Setting priority for the Gpio-service to ${config.gpioPriority}`)
    try {
      // setting priority of current process
      os.setPriority(config.gpioPriority)
    } catch (err) {
      log.debug('need root permission to set priority of Gpio-Thread')
    }
  }

  // with pull-up we need to detect FALLING_EDGE as based on my experience this yields cleaner results -
  // I dont fully understand the technicalities but its down to how the LOW and HIGH state is detected,
  // namely the minimum voltage that is required to be read as LOW state (this is a possible explanation: https://electronics.stackexchange.com/questions/406200/esp32-logic-high-low-level-voltage-not-working or https://raspberrypi.stackexchange.com/questions/29918/gpio-voltage-thresholds. Others seem to experience this as well: https://forum.arduino.cc/t/a-tachometer-rev-counter-using-a-ugly-signal-coming-from-the-ignition-coil/594516/42?page=3)
  const sensor = new Gpio(config.gpioPin, { mode: Gpio.INPUT, pullUpDown: Gpio.PUD_UP, edge: Gpio.FALLING_EDGE })

  let previousCleanRevTime = 0;
  let previousRawDeltaTime = 0;

  sensor.on("interrupt", (value, now) => {

    // ToDo: according to the docs this should work how can we test this?
    const currentRawDeltaTime = (now - previousCleanRevTime) / 1e6
    // first stage debounce filter: we discard any pulse that is below the minimumTimeBetweenImpulses as those must be bounces from the switch
    if (currentRawDeltaTime > config.rowerSettings.minimumTimeBetweenImpulses) {

      // we calculate the absolute difference between the current and the last DTs
      const deltaTimeDiff = Math.abs(currentRawDeltaTime - previousRawDeltaTime);

      // we save the current value in state to use on the next impulse
      previousRawDeltaTime = currentRawDeltaTime;

      // We only accept rotation signals that are sensible (the absolute difference of the current and the previous deltas are smaller than the current delta)
      if (deltaTimeDiff < currentRawDeltaTime) {
        previousCleanRevTime = now;
        process.send(currentRawDeltaTime)
      }
    }
  })
}
createGpioTimerService()

JaapvanEkris commented 1 year ago

Hi @Abasz ,

I've take a bit more conservative approach: I think we should first make sure we are using pigpio in an optimal way (this is let their debounce filter do its work) and then see if we still have a problem that we need to solve with our own code. So I tested the following code today:

'use strict'
/*
  Open Rowing Monitor, https://github.com/laberning/openrowingmonitor

  Measures the time between impulses on the GPIO pin. Started in a
  separate thread, since we want the measured time to be as close as
  possible to real time.
*/
import process from 'process'
import pigpio from 'pigpio'
import os from 'os'
import config from '../tools/ConfigManager.js'
import log from 'loglevel'

const Gpio = pigpio.Gpio

log.setLevel(config.loglevel.default)
export function createGpioTimerService() {
  if (config.gpioPriority) {
    // setting top (near-real-time) priority for the Gpio process, as we don't want to miss anything
    log.debug(`Setting priority for the Gpio-service to ${config.gpioPriority}`)
    try {
      // setting priority of current process
      os.setPriority(config.gpioPriority)
    } catch (err) {
      log.debug('need root permission to set priority of Gpio-Thread')
    }
  }

  // Set the gpio frequency ( at 1us, CPUload is around 16% on a Pi4
  pigpio.configureClock(5, pigpio.CLOCK_PCM);

  // with pull-up we need to detect FALLING_EDGE as based on my experience this yields cleaner results -
  // I dont fully understand the technicalities but its down to how the LOW and HIGH state is detected,
  // namely the minimum voltage that is required to be read as LOW state (this is a possible explanation: https://electronics.stackexchange.com/questions/406200/esp32-logic-high-low-level-voltage-not-working or https://raspberrypi.stackexchange.com/questions/29918/gpio-voltage-thresholds. Others seem to experience this as well: https://forum.arduino.cc/t/a-tachometer-rev-counter-using-a-ugly-signal-coming-from-the-ignition-coil/594516/42?page=3)
  // ToDo: see if this si the most optimal setting for all configurations, as a optocoupler might be cleaner on the upgoing flank
  const sensor = new Gpio(config.gpioPin, { mode: Gpio.INPUT, pullUpDown: Gpio.PUD_UP, edge: Gpio.FALLING_EDGE })

  // Debounce filter set to 0.5ms, this will probably become a setting in the final code
  sensor.glitchFilter(500);

  let previousTick = 0;

  sensor.on("interrupt", (value, currentTick) => {
    // The following command fails as the tickDiff command isn't defined?
    const currentDt = pigpio.tickDiff(previousTick, currentTick) / 1e6
    // const currentDt = ((currentTick >> 0) - (previousTick >> 0)) / 1e6
    previousTick = currentTick
    process.send(currentDt)
  })
}
createGpioTimerService()

Based on a first 6K, I do have the impression that the resulting data suffers a little bit less from noise (lines are more smooth), but some spikes and systematic sawtooth effects are still present. As my machine has no debounce (it converts a continous sine wave into a binary blockwave), I don't know if that would resolve your issues (but then again, changing the glitchFilter might help there).

For me, the first step would be to play with the configureClock function option for the detection interval (is default 5us) to see what that does.

edit: code cleanup

JaapvanEkris commented 1 year ago

Okay,

I've played with the configureClock option and set it to 1us, with a glitchFilter(500) (i.e. 0.5ms). I've looked at the code, and the glitchFilter is handled by the C-code, so I assume it is inside the interrupt handler, removing any dependence on Node.js for this filter. As a first line of defense from noise, I think it is preferable that way.

In this setup:

The distance travelled was too much. Normally I get a difference of 2 to 3 meters (startup behaviour) difference with the PM5, now it increased to 30 meters as the row progressed;
Time seemed to move too quickly, it was 5 to 10 seconds faster than the PM5 on a 25 minute row;
Looking at the logs, Drag calculation fails catastrophically all over the place, a R^2 of 0.83 proved to be too much in the majority of the cases, where normally I have a R^2 of 0.97. This might explain the distance issue, but not the time issue;
Looking at the raw files, I see a lot of split detections, so a currentDt that should be around 9ms (based on the surrounding currentDt's) is split up in two currentDt's of 4.5ms. Could be a debounce issue that isn't caught by the glitchFilter. This explains the failure of Drag factor calculation (and thus the distance issue), but not the time jump.

My conclusion is that either the 0.5ms glitchFilter is too short or the 1us polling frequency is too high. My first step is to increase the glitchFilter to 1ms and see what happends.

Abasz commented 1 year ago

Can you send me please the raw DT-s. I went back to the original implementation to test the PR and I dont really have proper long data sets to visualize. I can run it trough the two stage denounce filter to see what comes up. Generally the C2 should not have bounce as it has a hall effect sensor that should not suffer from this issue. I suspect that this is due to the way the voltage changes detected by the optocpouler or (that in my view less likely) an issue related to the voltage generator in the C2. And by issue I do not mean an error, but some concept we are not aware of.

note1: I believe that pigpio basically sets up a polling PWM on the GPIO, and the clock configuration determines the polling frequency. with 1us, the "interrupt" to trigger a state should be present for longer than 1us. That is achieved by any rower, so 1us should be more than correct. I think there is no such thing that too high polling in terms of data accuracy.

What you could also try:

Change the edge back to Gpio.RAISING_EDGE. I expect this to worsen the matter but who knows.
In crease the glitchFilter to the minimumTimeBetweenImpulses. Literally there cannot be any impulse on a rower that is shorter than that. I experimented with this extensively with my MCU implementation
Run a quick test of the two stage filter implementation and see what that does to the signal

JaapvanEkris commented 1 year ago

Can you send me please the raw DT-s. I went back to the original implementation to test the PR and I dont really have proper long data sets to visualize. I can run it trough the two stage denounce filter to see what comes up.

For a good one, see https://github.com/JaapvanEkris/openrowingmonitor/blob/Backend_Redesign/recordings/Concept2_RowErg_Session_2000meters.csv

The recording of yesterday, with lots of noise, I'll post tonight.

Generally the C2 should not have bounce as it has a hall effect sensor that should not suffer from this issue. I suspect that this is due to the way the voltage changes detected by the optocpouler or (that in my view less likely) an issue related to the voltage generator in the C2. And by issue I do not mean an error, but some concept we are not aware of.

That's what I thought as well, as the Concept2 uses a coil based generator that produces a continuous sine wave of some kind. So I did learn something yesterday :) I suspect that they use a full rectifier behind it, as the signal fluctuates between 0V and 15V (where a traditional generator would have fluctuated between -15V and 15V). Could be that in the signal flip, some odd things happen.

note1: I believe that pigpio basically sets up a polling PWM on the GPIO, and the clock configuration determines the polling frequency. with 1us, the "interrupt" to trigger a state should be present for longer than 1us. That is achieved by any rower, so 1us should be more than correct. I think there is no such thing that too high polling in terms of data accuracy.

That is my assumption as well, and I like to see the CPU glow with heat in the night ;)

Change the edge back to Gpio.RAISING_EDGE. I expect this to worsen the matter but who knows.

Good one, going to try that one. If that resolves it, we got another candidate for the config-file (was already requested as well)

In crease the glitchFilter to the minimumTimeBetweenImpulses. Literally there cannot be any impulse on a rower that is shorter than that. I experimented with this extensively with my MCU implementation

I'm not certain if that works like that, but I'm no expert on this class. Wouldn't the glitchfilter require a signal to be present for x us? So it would ignore short spikes, regardless how they are spaced?

Run a quick test of the two stage filter implementation and see what that does to the signal

Will try that in the weekend. It is an interesting approach, but if we could outsource part of the work to pigpio and make it part of the interrupthandler, I would like it even more.

Abasz commented 1 year ago

I'm not certain if that works like that, but I'm no expert on this class. Wouldn't the glitchfilter require a signal to be present for x us? So it would ignore short spikes, regardless how they are spaced?

You are right sorry, I quickly checked the documentation. Actually that is not what we need. What we need is to ignore an impulse where the deltaTime is less than the minimumTimeBetweenImpulses. Actually I found this issue on the pigpio: https://github.com/fivdi/pigpio/issues/26

It recommends lodash.debounce function, which is what this really needs. But in my opinion these are all an overkill when what is needed is a simple check that the current deltaTime is longer than a set period of time and if not we skip reporting the impulse. In this sense even the glitchFilter is unnecessary, not to mention that it does not really capture the intent.

JaapvanEkris commented 1 year ago

Actually that is not what we need. What we need is to ignore an impulse where the deltaTime is less than the minimumTimeBetweenImpulses. Actually I found this issue on the pigpio: fivdi/pigpio#26

I don't know. If a bounce or other ghost reading is a much shorter pulse than a real pulse, you can easily kill any ghost reading with it, and it would actually be preferable. Typically, contact bounce or ghost readings are spikes, where real readings are much longer. So removing the spikes leaves you with the real data.

It actually is the normal/preferred way to kill this type of noise, as you use the nature of the signal (short spike vs strong pulse) instead of the interspacing between successive upgoing flanks. Using the latter for noise filtering is less effective as you become dependent on correct interspacing of your signals, and one miss will kill your noise filter as it is out of sync with the real data. That is also what happened with the earlier versions of stroke detection: if you focus too much on interspacing (of the strokes), one badly detected stroke will get you out of sync with the real state and thus will create havoc among all following strokes.

Abasz commented 1 year ago

I see where you are coming from and you are probably right on the topic of filter getting thrown off.

But ISRs do not have the concept of "length". Interrupt is triggered on state transition. It does not know (and should not know) how long a given state is present.

This means that in order to see the length of the signal it relies on the same concept of interspacing. pigpio checks the time between two consecutive state change and if it is longer it reports it valid. Please see: static void alertGlitchFilter(gpioSample_t *sample, int numSamples). Basically it checks the time between two consecutive state changes and if it is shorter than the steadyUs it sets the state change bit.

EDIT: Also, the length of a given state depends on multiple factors: e.g. speed, strength of the magnet. Some of these are session independent (e.g. magnet) but speed for instance can vary within a session. The faster the flywheel goes the shorter a given state is present (shorter time is needed to pass the magnet or sensor or which ever). I believe this would make it extremely difficult to create a good filter that would inspect and validate an impulse based on its length. Especially as we do not know anything about the characteristics of this side of the impulse (i.e. how long the high state, passing the magnet, is present).

For a good one, see https://github.com/JaapvanEkris/openrowingmonitor/blob/Backend_Redesign/recordings/Concept2_RowErg_Session_2000meters.csv

I had a look at the data you provided and it seemingly two types of noise:

has timing delays. These are the sawtooth, where a late recognition is followed by a shorter one and are symmetric
has interesting upper spikes without the corresponding shorter pair, suggesting that there is a missed impulse

While it does seem to lack the classic bounce where you have a few really short impulses following a longer one (which is of course consistent with my expectation on the hall effect sensor not producing this). The noise I encountered so far is completely different in nature. Hence you are absolutely correct that the code I posted before will not be able to clean this up as it is not systematic... The noise I encountered and used the filter on was systematic in a sense that when the consecutive DTs were added up they equaled to the actual DT (although I did not do a full workout, only a couple of pulls- with the pigpio).

DeltaTimes - Reed switch - pigpio
35344
6735
33730
6686
32092
6386
3174
19
591
3128
5985
28974
5786
28279
5475
27588
5478
2661
537
26121
568
25676
527
2485
516
24479

You can basically see a pattern here. Generally 3 row should be added to get the real DT.

I am really surprised about the missed impulses in your data. Its hard to imagine that it is the Rpi that missed a valid signal with a 1us polling. But it is possible that the voltage transition (to trigger the edge) was not sufficient, e.g. because the optocopuler was slow. But I dont have any practical experience with optocouplers.

When I had the chance to record a few sessions on a C2 a few month back, I did not encountered this issue. Although I did not use an Rpi and I used a MOSFET logic level shifter circuit (that I expect is more suitable for accurate and fast level shifting) instead of an optocoupler. This recorded something like this:

DeltaTimes - C2 - MCU
529
435
3928
10730
410
356
4187
10233
103
4920
10140
61
4877
10492
4755
10989
4721
10990
4716
10564
4828
10235
4875
10201

Even here you can see the pattern. The data was no inline with my expectation in terms of bounce (there should have been none), but I did not analyse this matter as the filtered data was correct.

JaapvanEkris commented 1 year ago

But ISRs do not have the concept of "length". Interrupt is triggered on state transition. It does not know (and should not know) how long a given state is present.

That is why we should use alerts, where the documentation mentions the following:

glitchFilter(steady) Sets a glitch filter on a GPIO. steady - Time, in microseconds, during which the level must be stable. Maximum value: 300000 Level changes on the GPIO are not reported unless the level has been stable for at least steady microseconds. The level is then reported. Level changes of less than steady microseconds are ignored. This means that event callbacks will only be executed if the level change is at least steady microseconds long. Note that each (stable) edge will be timestamped steady microseconds after it was first detected. This filter only affects the execution of callbacks from the alert event, not those of the interrupt event.

So you set a filter for a minumum length of a specific signal, so the minumum absence or the presence time of a magnet.

This means that in order to see the length of the signal it relies on the same concept of interspacing. pigpio checks the time between two consecutive state change and if it is longer it reports it valid.

No, it doesn't, it uses a totally differenct concept. A concept that GPIO should rely on. It uses the time between state changes. Effectively, it determines a minumum pulse length. So when used on an upward flank, it determines the minmum time the magnet should be seen before it is considered a valid signal. When used on a downward flank, it is the minimum time a magnet should be absent before it is considered a signal. By requiring a minum pulse length, you can kill short debounces without having to worry about the time between this pulse and the next one.

EDIT: Also, the length of a given state depends on multiple factors: e.g. speed, strength of the magnet. Some of these are session independent (e.g. magnet) but speed for instance can vary within a session. The faster the flywheel goes the shorter a given state is present (shorter time is needed to pass the magnet or sensor or which ever). I believe this would make it extremely difficult to create a good filter that would inspect and validate an impulse based on its length. Especially as we do not know anything about the characteristics of this side of the impulse (i.e. how long the high state, passing the magnet, is present).

You are looking at the wrong thing here. You don't want to detect the good signals, you just want to remove the bad pulses. So you don't have to know how long the minumum good pulse is. You just need to find a value for the filter where all noise is gone from the raw CurrentDt file. This is a similar proces that applies to all metrics from ORM: you try and adjust until the result starts to make sense.

has timing delays. These are the sawtooth, where a late recognition is followed by a shorter one and are symmetric

has interesting upper spikes without the corresponding shorter pair, suggesting that there is a missed impulse

That is the reason the 64Bit Raspbian Lite is now the prefered kernel: it is a PREEMPT kernel (and thus optimized for low latency operations) anf gpio-priority is now an integer, as you need at least a priority of -1 to keep regular housekeeping tasks of the Linux OS at bay. Since switching to that configuration, these issues haven't been present. It is part of the callibration set as ORM's current engine should not get distracted by such signals.

While it does seem to lack the classic bounce where you have a few really short impulses following a longer one (which is of course consistent with my expectation on the hall effect sensor not producing this). The noise I encountered so far is completely different in nature. Hence you are absolutely correct that the code I posted before will not be able to clean this up as it is not systematic... The noise I encountered and used the filter on was systematic in a sense that when the consecutive DTs were added up they equaled to the actual DT (although I did not do a full workout, only a couple of pulls- with the pigpio).

I am really surprised about the missed impulses in your data. Its hard to imagine that it is the Rpi that missed a valid signal with a 1us polling. But it is possible that the voltage transition (to trigger the edge) was not sufficient, e.g. because the optocopuler was slow. But I dont have any practical experience with optocouplers.

You are talking about an Optocoupler and surrounding circuit that is designed and rated to handle over 80 kHz in industrial applications easily. With a C2 producing around 100 signals per second, there is more than sufficient resolution left to make meaningful measurements.

When I had the chance to record a few sessions on a C2 a few month back, I did not encountered this issue. Although I did not use an Rpi and I used a MOSFET logic level shifter circuit (that I expect is more suitable for accurate and fast level shifting) instead of an optocoupler.

I wouldn't be so fast in assuming that a MOSFET is more accurate and suitable, it really depends on the quality of the electronics used, and the environmental condition they are used in. Given the huge variation in the measurements, I would first look at the measurement chain.

JaapvanEkris commented 1 year ago

My current code, that generates valid currentDt's without any bad noise:

'use strict'
/*
  Open Rowing Monitor, https://github.com/laberning/openrowingmonitor

  Measures the time between impulses on the GPIO pin. Started in a
  separate thread, since we want the measured time to be as close as
  possible to real time.
*/
import process from 'process'
import pigpio from 'pigpio'
import os from 'os'
import config from '../tools/ConfigManager.js'
import log from 'loglevel'

log.setLevel(config.loglevel.default)

let previousTick = 0

// Will become settings later on
let gpioFlank = 'Up'
let gpioInterval = 1
let gpioNoiseFilter = 500

export function createGpioTimerService () {
  if (config.gpioPriority) {
    // setting top (near-real-time) priority for the Gpio process, as we don't want to miss anything
    log.debug(`Setting priority for the Gpio-service to ${config.gpioPriority}`)
    try {
      // setting priority of current process
      os.setPriority(config.gpioPriority)
    } catch (err) {
      log.debug('need root permission to set priority of Gpio-Thread')
    }
  }

  const Gpio = pigpio.Gpio

  // Configure the gpio polling frequency
  pigpio.configureClock(gpioInterval, pigpio.CLOCK_PCM)

  // Configure the sensor readings for one of the Gpio pins of Raspberry Pi
  const sensor = new Gpio(
    config.gpioPin, {
    mode: Gpio.INPUT,
    pullUpDown: Gpio.PUD_UP,
    alert: true
  })

  // Set a minumum time a level must be stable before an alert event is emitted.
  sensor.glitchFilter(gpioNoiseFilter);

  // Define the alert handler
  sensor.on('alert', (level, currentTick) => {
    if ((gpioFlank === 'Down' && level === 0) || (gpioFlank === 'Up' &&  level === 1)) {
      const currentDt = ((currentTick >> 0) - (previousTick >> 0)) / 1e6
      previousTick = currentTick
      process.send(currentDt)
    }
  })
}
createGpioTimerService()

JaapvanEkris commented 1 year ago

When I had the chance to record a few sessions on a C2 a few month back, I did not encountered this issue. Although I did not use an Rpi and I used a MOSFET logic level shifter circuit (that I expect is more suitable for accurate and fast level shifting) instead of an optocoupler. This recorded something like this: ... Even here you can see the pattern. The data was no inline with my expectation in terms of bounce (there should have been none), but I did not analyse this matter as the filtered data was correct.

Come to think of it, are you 100% sure your circuit is free of magnetic interference and other issues? When I look at this older ORM issue about the electrical connection with a model C and this this ORM discussion about the electrical connection with a model C, I see two people that plugged in a scope and getting nice clean and consistently interleaved pulses. The data one of these people shared from the scope is also nice and clean. So you measuring a debounce on a C2 is quite odd as two other people just hooked up a scope and got a clean signal.

Looking at their new generator, I see some systematic error due to physical magnet placement, but only when I work interruptbased I get this kind of odd behaviour. When I use the normal onoff library or the pigpio alert-based approach (see above) it is free of this kind of errors.

Abasz commented 1 year ago

Model C uses a completely different pickup and concept for detecting signals: https://shop.concept2.com/parts/121-monitor-pickup-and-screw-kitmodel-a-c.html -> this is based on the magnetic effect on resistance that affects the voltage.

While Model D uses the power generator: https://shop.concept2.com/parts/424-sensor-coil.html -> this is based on Logic level voltage

I carried out my limited number of tests on Model D

Abasz commented 1 year ago

That is why we should use alerts, where the documentation mentions the following:

Please see this issue on alert: https://github.com/fivdi/pigpio/issues/45. I think this is a 1ms polling and not real interrupt and this is the reason it works without the bounces. According to the docs: The thread which calls the alert functions is triggered nominally 1000 times per second. The active alert functions will be called once per level change since the last time the thread was activated. i.e. The active alert functions will get all level changes but there will be a latency. (http://abyz.me.uk/rpi/pigpio/cif.html#gpioSetAlertFunc)

I think alert is not going to achieve the desired aim because changes are only reported once per 1ms.

In addition on the time issue you mentined before: gpioSetISRFunc is used for interrupts. [...] However, the tick information it provides is not as accurate as the tick information provided by gpioSetAlertFunc. In addition, gpioSetISRFunc can't detect as many state changes per second as gpioSetAlertFunc.

JaapvanEkris commented 1 year ago

While Model D uses the power generator: https://shop.concept2.com/parts/424-sensor-coil.html -> this is based on Logic level voltage

I carried out my limited number of tests on Model D

The model D generator is identical to my RowErg (it is a rebranding, not a technical change). I've tested that connection for over 1000 km without any of the effects you describe. The 2000 meter log you analysed is captured with a Concept2 generator hooked up to a 80KHz Optocoupler (switching to true at 12V) and inserted as a blockwave into the RPi, and the signal also hooked up to the PM5 which is powered by it. I had no issue there.

JaapvanEkris commented 1 year ago

That is why we should use alerts, where the documentation mentions the following:

Please see this issue on alert: fivdi/pigpio#45. I think this is a 1ms polling and not real interrupt and this is the reason it works without the bounces. According to the docs: The thread which calls the alert functions is triggered nominally 1000 times per second. The active alert functions will be called once per level change since the last time the thread was activated. i.e. The active alert functions will get all level changes but there will be a latency. (http://abyz.me.uk/rpi/pigpio/cif.html#gpioSetAlertFunc)

When I read the answer in full, it says the following:

gpioSetAlertFunc is used for alerts. It provides information about state changes once per millisecond so there's quite a bit of latency. However, it provides very accurate tick information and can detect a lot more state changes per second than gpioSetAlertFunc.

gpioSetISRFunc is used for interrupts. It provides information about state changes as quickly as possible so there's a lot less latency than with gpioSetAlertFunc. However, the tick information it provides is not as accurate as the tick information provided by gpioSetAlertFunc. In addition, gpioSetISRFunc can't detect as many state changes per second as gpioSetAlertFunc.

The key phrase here for me is

there's a lot less latency than with gpioSetAlertFunc. However, the tick information it provides is not as accurate as the tick information provided by gpioSetAlertFunc.

This suggests that the tick information on interrupts is worse than the tick information in an Alert. As the tick information is the only ingredient for CurrentDt, I rather have an accurate reading a bit later, than an inaccurate reading soon.

Looking at the official documentation accompanying the library (see https://github.com/fivdi/pigpio/blob/master/doc/gpio.md#the-difference-between-interrupts-and-alerts), it says:

The Difference Between Interrupts and Alerts Both interrupts and alerts provide information about state changes. Interrupts provide this information as quickly as possible and the latency is as low as possible. Alerts are queued and fired once per millisecond so the latency is higher. It's possible to detect more alerts than interrupts per second. Both interrupts and alerts provide tick information that are accurate to a few microseconds

So the measurement mechanism is identical, it is just the reporting (immediate vs queued) which is different. At this point, there is no need for extreme low latency, as we depend on pigpio to deliver the reported tick information, and not on the CPU clock based on an trigger to calculate CurrentDt.

In the old onoff construction, we needed interrupts to make sure our time determination was correct, as we used the OS time to calculate it on our own. With pigpio, an alert or interrupt will report the tick information by itslef, so there is no real rush to process this. The only requirement is that it is processed before the next alert or interrupt arrives. As Alerts are interspaced at least a ms, that is easily achievable.

JaapvanEkris commented 1 year ago

I've tested the code on a 6K (approx 25 min) and a 12K (approx 50 min), and the time drift issue seems to be gone as well. Resulting Force and Power curves look a bit smoother, and no spikes in the data. So this code looks pretty acceptable to me, bar the need to move the settings. Does this solve your problem of debouncing your machine?

I did realize that it is best to test for the upgoing flank (i.e. magnet present for minum of 500 microseconds with my settings), as debounces typically are short spikes where something is errornously detected. When you would test for a downgoing flank (i.e. not a magnet present) a badly timed spike could still lead to a split CurrentDt (as the spike triggers another timer and there is plenty of "magnet absent" time to go around).

Abasz commented 1 year ago

With a pull-up resistor, shouldn't be the logic reversed? I.e. down going is when magnet passes and up going when magnet is not present?

In case of pull-up resistor logic is generally reversed, the pin goes down when the circuit is closed, hence the question. But what ever is cleaner of course.

Does this solve your problem of debouncing your machine?

Honestly I did not have time to do proper testing on this. From tomorrow I will not be able to do exerciser for 3-6 month due to an injury and I wanted to squeeze testing of the new backend with real data as much as possible separately from this.

I will do a few pulls today to see how it goes. Sorry about this.

JaapvanEkris commented 1 year ago

With a pull-up resistor, shouldn't be the logic reversed? I.e. down going is when magnet passes and up going when magnet is not present? In case of pull-up resistor logic is generally reversed, the pin goes down when the circuit is closed, hence the question. But what ever is cleaner of course.

Could be, I am going to provide the options anyway, as well as the "both" option, as it is easy to implement and some constructions invert the logic for sure. But it is something to look at when people are setting up their machine.

Does this solve your problem of debouncing your machine?

Honestly I did not have time to do proper testing on this. From tomorrow I will not be able to do exerciser for 3-6 month due to an injury and I wanted to squeeze testing of the new backend with real data as much as possible separately from this.

Sorry to hear about your injury! I hope you will recovery fine.

We'll here how your preliminary testing goes. But no stress there. I'll row a lot more kilometers with this setup as well, and make sure we can configure the pigpio debounce filter well from the settings and introduce it in V1Beta (you can see the settings already mentioned). If this keeps working as reliably as it does now, it adds a bit more flexibility in a well-constructed way and provides higher quality data, so adding it to the code base is quite desireable. From there, we can take further improvements after your recovery.

Abasz commented 1 year ago

I run a quick test with couple of strokes on different settings. Here are my observations and results:

The code produces very nice and clean results. Its nearly identical to the data recorded with the MCU. For some reasons the data recorded by ORM and the MCU starts to drift apart in a way that ORM records more points. No idea why this happens
For me the glitchFilter could have been set to 50 and it worked the same as with 500 (the upper is recorded with 500 and the lower with 50)
glitchFilter 0 shows the bounces properly
The Up and Down flank does not matter, produces the same clean results
I noticed that the first, long DT was negative in 3 out of 5 cases. Not sure why, and I am not sure how to debug that either... Have you noticed anything like that? I suspect its the unsigned arithmetic that fails 2022-09-28_16-26-25_raw.csv 2022-09-28_16-18-57_raw.csv 2022-09-28_16-33-59_raw.csv

JaapvanEkris commented 1 year ago

First of all, many many many thanks for testing this.

The code produces very nice and clean results. Its nearly identical to the data recorded with the MCU. For some reasons the data recorded by ORM and the MCU starts to drift apart in a way that ORM records more points. No idea why this happens

oke, I'll see what I can measure on my machine. Time has to come from somewhere.

For me the glitchFilter could have been set to 50 and it worked the same as with 500 (the upper is recorded with 500 and the lower with 50)

Good to know, I'll start playing with it as well, Setting it to 50 might be a very nice default value in the settings.

glitchFilter 0 shows the bounces properly

The Up and Down flank does not matter, produces the same clean results

Good to know, I am still adding the options, as it might help people later in the process. By accident I activated the "Both" option, which lead to surprisingly good results. Don't know why yet, but interesting.

I noticed that the first, long DT was negative in 3 out of 5 cases. Not sure why, and I am not sure how to debug that either... Have you noticed anything like that? I suspect its the unsigned arithmetic that fails 2022-09-28_16-26-25_raw.csv 2022-09-28_16-18-57_raw.csv 2022-09-28_16-33-59_raw.csv

Good point. Initialisation is a tough one to begin with: onoff had a similar issue, that the first CurrentDt was always huge. As we start with a 0 for the PreviousTick, you typically end up with the time since the Rpi booted for the first CurrentDt. I'll pick that one up as a point for the new code for V1Beta.

I hope you recover quickly and without too much pain! And we keep in touch.

JaapvanEkris commented 1 year ago

The updated code has been added to the proposed backend update (see https://github.com/laberning/openrowingmonitor/pull/84). 50us seems to be a decent filter indeed. I added some code to flywheel.js to detect abnormal impulse lengths (but not correct them, it is just logging.