consider shifts instead of division

nerdralph commented 4 years ago

Division on AVRs is very expensive in both time and space. Signed long division will increase code size by about 150 bytes.

Even better would be to do the averaging by summation only. Do a fixed 64 loops, adding the ADC value to a uint16_t. Since the AVR ADC is 10 bits, 64 loops will give a final result that fits in 16 bits without overflow.

martin2250 commented 4 years ago

Hi Ralph,

you're right, using long division for such a simple task is overkill. That said, shouldn't the compiler optimize the division if the denominator is hard-coded (may be wrong, I never tested this)? Maybe we should encourage people to use powers of two for the number of samples preferably. I don't think fixing the number of samples to 64 is a good idea, we should give users the choice to trade speed for noise.

A good compromise could be adding a function that just take a single sample, that would allow advanced users to write their own loop and be clever by using shifts and such.

Martin

nerdralph commented 4 years ago

gcc will usually optimize division by a power of two into right shifts. For division by the default of 200, there's not much that can be optimized.

If you want to offer the choice between quick (i.e. 1 sample) and lower noise from multiple samples, then limit the samples to 64. That's more than enough to average out the noise, and it will still fit in a uint_16. Using a long to accumulate samples increases the code size, and provides no tangible benefit over a limit of 64 samples.

On Sat, Feb 29, 2020 at 12:13 PM Martin Pittermann notifications@github.com wrote:

Hi Ralph,

you're right, using long division for such a simple task is overkill. That said, shouldn't the compiler optimize the division if the denominator is hard-coded (may be wrong, I never tested this)? Maybe we should encourage people to use powers of two for the number of samples preferably. I don't think fixing the number of samples to 64 is a good idea, we should give users the choice to trade speed for noise.

A good compromise could be adding a function that just take a single sample, that would allow advanced users to be clever and use shifts and such.

Martin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/martin2250/ADCTouch/issues/8?email_source=notifications&email_token=ABKNZ6ST4AF2GTWJY6RF2GTRFEZ25A5CNFSM4K6WKU32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENL55EA#issuecomment-592961168, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKNZ6QDEIZ7CXOV7W7S3FLRFEZ25ANCNFSM4K6WKU3Q .

jerryfaust commented 3 years ago

Hello all.

I'm trying something different that seems to work fairly well, and I would be interested in any feedback.

I am running a 4-axis stepper motor system, and I am doing a pretty lot during each arduino loop() call. Trying out this library, calling the read() method on every loop was too costly, bogging down the axis movement. When reading this conversation and considering optimizations, it occurred to me that I could call the read method ONCE during setup(), taking the 64 readings and placing them into a rolling array. Then on each call to loop(), I have implemented a readNext() method, that just takes one reading, places it into the next position in the rolling array, and gets a new average based on that. This significantly reduced the time required during each iteration, and I think produces similar results to the original algorithm. My loop iterates over 2000 times per second when it's busy, and so the average readings are refreshed many times over. Here is (most of) the new code.

constexpr auto SAMPLES = 16;
int _values[SAMPLES];
int _currentSlot = 0;
int _lastValue = 0;
int _sum = 0;

// Call once during setup()
int ADCTouchClass::read(byte ADCChannel)
{
    //long _value = 0;
    short _value = 0; // , _sum = 0;
    for (int _counter = 0; _counter < SAMPLES; _counter++)
    {
        // set the analog pin as an input pin with a pullup resistor
        // this will start charging the capacitive element attached to that pin
        pinMode(ADCChannel, INPUT_PULLUP);

        // connect the ADC input and the internal sample and hold capacitor to ground to discharge it
        ADMUX |= 0b11111;
        // start the conversion
        ADCSRA |= (1 << ADSC);

        // ADSC is cleared when the conversion finishes
        while ((ADCSRA & (1 << ADSC)));

        pinMode(ADCChannel, INPUT);

        // get value
        _value = analogRead(ADCChannel);
        // save in rolling array
        _values[_counter] = _value;
        // add to running total
        _sum += _value;
    }
    // get average
    _lastValue = _sum / SAMPLES;
    // return 
    return _lastValue;
}

// call once at top of every loop()
int ADCTouchClass::readNext(byte ADCChannel)
{
    short _value = 0; // , _sum = 0;
    // set the analog pin as an input pin with a pullup resistor
    // this will start charging the capacitive element attached to that pin
    pinMode(ADCChannel, INPUT_PULLUP);

    // connect the ADC input and the internal sample and hold capacitor to ground to discharge it
    ADMUX |= 0b11111;
    // start the conversion
    ADCSRA |= (1 << ADSC);

    // ADSC is cleared when the conversion finishes
    while ((ADCSRA & (1 << ADSC)));

    pinMode(ADCChannel, INPUT);

    // get current value
    _value = analogRead(ADCChannel);
    // adjust sum
    _sum = _sum - _values[_currentSlot] + _value;
    // place in rolling array
    _values[_currentSlot++] = _value;
    // determine next slot to fill
    _currentSlot = _currentSlot % SAMPLES;
    // now get new sum
    //for (int i = 0; i < SAMPLES; i++)
    //{
    //  _sum += _values[i];
    //}
    // get average
    _lastValue = _sum / SAMPLES; // (_sum >> 4); // 
    // return 
    return _lastValue;
}

int ADCTouchClass::lastValue()
{
    return _lastValue;
}

Additional comments:

I have stripped out the ifdef's for the Tiny processor (mostly for readability here).
I have reduced the number of readings to 16 and still get what I think are reasonable results.
I store the most recent average in _lastValue so that within the remainder of my loop, I can query the value without having to do another reading.
I tried using shift-right rather than division (as suggested in this conversation), but did not notice any real difference in performance; perhaps because of compiler optimization, staying with powers of 2.
And although I don't think it makes much difference at 16 samples, rather than re-totaling the sum on every iteration, I simply subtract the previous value in that array element and add the new value, maintaining _sum without having to add them all up again.
I realize that I could factor out the common code of the two methods...

Any thoughts?

Regards, Jerry

martin2250 / ADCTouch

consider shifts instead of division #8