milesburton / Arduino-Temperature-Control-Library

Arduino Temperature Library
https://www.milesburton.com/w/index.php/Dallas_Temperature_Control_Library
968 stars 487 forks source link

MAX macro in setResolution() makes code performance inefficient #151

Closed RobTillaart closed 4 years ago

RobTillaart commented 4 years ago

See #127

function: setResolution(addr, res) line: bitResolution = max(bitResolution, getResolution(deviceAddress))

The max macro in setResolution() expands the call to getResolution() twice in the code. This is a performance problem as for getResolution() the scratchpad is read which takes ~ 20 millis. Solution is to extract the getResolution() from the macro.

Same use of the macro is in ::begin() which is less a problem as it is executed only once.

Code needs to be checked on macro's in other places, to see if there are more problems.

Finally the loop in which the bitResolution is determined can be ended if bitResolution == 12 as it will not get higher.

RobTillaart commented 4 years ago

no macro MIN used, two places macro CONSTRAIN used to set resolution between 9 and 12. so no function call in macro.

RobTillaart commented 4 years ago

Further analysis: The max macro only executes getResolution() twice if it is the bigger value. in the worst case this happens 3x in one setResolution().

Current fixed code is equal in size (compiled for an UNO) but it will break the loop when bitResolution == 12. In a test sketch with 2 devices (0==12 bit, 1 == 9 bit) this is ~30% faster so overal it is a gain. Will make the PR asap.

gitkomodo commented 4 years ago

Do I understand correctly that you have fixed code for the part of setResolution that manages the global resolution? I took a look at the part before that. If you did that part too, too bad for my time spend...

This is my version of the function, with the part that manages the global resolution as it was:

bool DallasTemperature::setResolution(const uint8_t* deviceAddress,
        uint8_t newResolution, bool skipGlobalBitResolutionCalculation) {

  // DS1820 and DS18S20 have no resolution configuration register, should always stay 12
  // Other values of newResolution not classified as error, it is just 'constrained to 12'
  if (deviceAddress[0] == DS18S20MODEL)
    return true;

  // Constrain if newResolution is out of range
  newResolution = constrain(newResolution, 9, 12);

  ScratchPad scratchPad;
  if (!isConnected(deviceAddress, scratchPad))
    return false;

  uint8_t newValue;
  switch (newResolution) {
    case 12:
      newValue = TEMP_12_BIT;
      break;
    case 11:
      newValue = TEMP_11_BIT;
      break;
    case 10:
      newValue = TEMP_10_BIT;
      break;
    case 9:
    default:
      newValue = TEMP_9_BIT;
  }
  if (scratchPad[CONFIGURATION] == newValue)
    return true;

  scratchPad[CONFIGURATION] = newValue;
  writeScratchPad(deviceAddress, scratchPad);

  // Manage global resolution
  // without calculation we can always set it to max
  bitResolution = max(bitResolution, newResolution);

  if (!skipGlobalBitResolutionCalculation && (bitResolution > newResolution)) {
    bitResolution = newResolution;
    DeviceAddress deviceAddr;
    for (int i = 0; i < devices; i++) {
      getAddress(deviceAddr, i);
      bitResolution = max(bitResolution, getResolution(deviceAddr));
    }
  }

  return true;
}

I think the whole switch statement can also be written with less code (I expect size will reduce when both this one and reverse are replaced): uint8_t newValue = ((newResolution-9)<<6) | 0x1F; And the reverse in getResolution(deviceAddress): return (scratchPad[CONFIGURATION]>>5)+9;

I was wondering about the part that manages the global resolution... Does it really need to be calculated here? Another option is to recalculate it only when it's needed. I think it is only used in the 'global' requestTemperatures() function and it can be called from outside the library.

RobTillaart commented 4 years ago

uint8_t newValue = ((newResolution-9)<<6) | 0x1F; And the reverse in getResolution(deviceAddress): return (scratchPad[CONFIGURATION]>>5)+9;

shift 6 and shift 5 ?? think it should be the same

The switch can indeed be written shorter in code, but it becomes less readable, and it is not substantial shorter in the binary. I expect in terms of performance it will not be a game changer too, so I propose not to "optimize" this.

I have not published my version of setResolution() as it is still draft, find it below, in fact it is quite similar which is a good sign :)

too bad for my time spend...

On contrary, it confirms other thoughts

The main differences are pointed out with <<<<<<<<<

  1. bitResolution is set to max of bitRes and newRes so the 2nd part of your if will always be false
  2. The max macro may execute the getResolution() twice, so using a temporary var for that
  3. instead of the max code.
  4. extra condition to exit the loop when maxResolution == 12 is reached. This might save time especially when there are many sensors.

Tomorrow evening I have a bit more time to work on this. Thanks for your contribution.

// set resolution of a device to 9, 10, 11, or 12 bits
// if new resolution is out of range, 9 bits is used.
bool DallasTemperature::setResolution(const uint8_t* deviceAddress,
                                      uint8_t newResolution, bool skipGlobalBitResolutionCalculation) {

  // DS1820 and DS18S20 have no resolution configuration register
  if (deviceAddress[0] == DS18S20MODEL) {
    return true;
  }

  // ensure same behavior as setResolution(uint8_t newResolution)
  newResolution = constrain(newResolution, 9, 12);

  // return when stored value == new value
  if (getResolution(deviceAddress) == newResolution)
    return true;

  ScratchPad scratchPad;
  if (!isConnected(deviceAddress, scratchPad)) {
    return false;
  }

  switch (newResolution) {
    case 12:
      scratchPad[CONFIGURATION] = TEMP_12_BIT;
      break;
    case 11:
      scratchPad[CONFIGURATION] = TEMP_11_BIT;
      break;
    case 10:
      scratchPad[CONFIGURATION] = TEMP_10_BIT;
      break;
    case 9:
    default:
      scratchPad[CONFIGURATION] = TEMP_9_BIT;
      break;
  }
  writeScratchPad(deviceAddress, scratchPad);

  // without calculation we can always set it to max
  bitResolution = max(bitResolution, newResolution);

  if (!skipGlobalBitResolutionCalculation) {   // <<<<<<<<<<   1
    bitResolution = newResolution;
    DeviceAddress deviceAddr;
    for (uint8_t i = 0; i < devices; i++) {
      getAddress(deviceAddr, i);
      uint8_t b = getResolution(deviceAddr);   // <<<<<<<<<<< 2
      if (b > bitResolution) bitResolution = b;  // <<<<<<<<<<< 3
      if (bitResolution == 12) break;   // <<<<<<<<<<< 4
    }
  }
  return true;  // new value set
}

What I am thinking about is that the latter loop also checks the device just set which is unnecessary, but preventing is worse than checking (at least I did not see an elegant solution for that).

gitkomodo commented 4 years ago
  1. Yes, (bitResolution > newResolution) will always be false after max, don't know why it's there. Come to think of it, I don't even know why the (first) max is there, even when called from getResolution(void) I don't think it adds desired behavior.

X. Good to change the type of i in the for loop to uint8_t, it now matches type of devices.

  1. Is it a choice to declare DeviceAddress deviceAddr outside the loop and uint8_t b inside? (or just a habit ;-) )

  2. This may be even faster than the max code because only compared in one direction.

  3. Yes, this will save time. I think it may even save more time if placed right after the for statement, first thing in the loop. If the new resolution is 12 this will save at least one address and resolution lookup (a search and a readscratchpad). Instead of using break, bitResolution < 12 could be added to the loop condition, but that's less readable.

Y. Yes, not really sure if checking for current address in every iteration would win a lot of time. It takes about 12ms to complete one call to getResolution(deviceAddress) [for me]. I guess you can compare a number of addresses in that time. What would be the break-even point? And is it worth the trouble?

Z. Last thought... if devices == 1 we're done after bitResolution = newResolution.

RobTillaart commented 4 years ago

Yes, (bitResolution > newResolution) will always be false after max, don't know why it's there. Come to think of it, I don't even know why the (first) max is there, even when called from getResolution(void) I don't think it adds desired behavior.

As it is done before the if (!skipGlobalBitResolutionCalculation ... you have catched a bug.

Is it a choice to declare DeviceAddress deviceAddr outside the loop and uint8_t b inside? (or just a habit ;-) )

uint8_t b was later added and I just minimized its scope to give the compiler a chance to optimize it. DeviceAddress was already there. Furthermore DeviceAddress is a more complex structure which might be harder to optimize? Scope wise it could be in the loop.

Found another bug in this line

  // return when stored value == new value
  if (getResolution(deviceAddress) == newResolution)
    return true;

When skipGlobalBitResolutionCalculation == false the new resolution and the resolution of the other devices should be reconsidered. Now it will exit prematurely, as it does not set the bitResolution to the newResolution.

Also the other 2 guards DS18S20MODEL and isConnected will exit prematurely when skipGlobalBitResolutionCalculation == false

Taken these bugs into account resulted in following refactor.

// set resolution of a device to 9, 10, 11, or 12 bits
// if new resolution is out of range, 9 bits is used.
bool DallasTemperature::setResolution(const uint8_t* deviceAddress,
                                      uint8_t newResolution, bool skipGlobalBitResolutionCalculation) {

  bool updated = false;

  // ensure same behavior as setResolution(uint8_t newResolution)
  newResolution = constrain(newResolution, 9, 12);

  // DS1820 and DS18S20 have no resolution configuration register
  if ((deviceAddress[0] == DS18S20MODEL) || (getResolution(deviceAddress) == newResolution))
  {
    updated = true;
  }
  else
  {
    ScratchPad scratchPad;
    if (isConnected(deviceAddress, scratchPad))
    {
      switch (newResolution) {
        case 12:
          scratchPad[CONFIGURATION] = TEMP_12_BIT;
          break;
        case 11:
          scratchPad[CONFIGURATION] = TEMP_11_BIT;
          break;
        case 10:
          scratchPad[CONFIGURATION] = TEMP_10_BIT;
          break;
        case 9:
        default:
          scratchPad[CONFIGURATION] = TEMP_9_BIT;
          break;
      }
      writeScratchPad(deviceAddress, scratchPad);
      updated = true;
    }
  }

  if (skipGlobalBitResolutionCalculation == false)
  {
    if (newResolution > bitResolution) bitResolution = newResolution;
    for (uint8_t i = 0; i < devices; i++)
    {
      if (bitResolution == 12) break;
      DeviceAddress deviceAddr;
      getAddress(deviceAddr, i);
      uint8_t b = getResolution(deviceAddr);
      if (b > bitResolution) bitResolution = b;
    }
  }
  return updated;
}

opinion?

gitkomodo commented 4 years ago

Hi, looking at the code I have something to say on these points:

  1. I think the first call to getResolution should be removed as it adds another (slow) call to readScratchPad(). This call can be combined with the call to isConnected() which also calls readScratchPad() in the background. I got rid of that first call to getResolution in my code posted above.

  2. My working assumption is that the global bitResolution is correct before this function is started. With that assumption it is only needed to recalculate the bitResolution when the writeScratchPad() statement is reached. So with this assumption there is no need for the updated flag. The only case where this assumption isn't correct would be after someone knowingly called setResolution with skipGlobalBitResolutionCalculation = true. I think someone doing that should be aware of the consequences and handle it somewhere else (or call global setResolution() once or setResolution(specific) twice with two different values to recover). Or perhaps someone doesn't really care, uses asynchronous mode and is always waiting > 750ms.

  3. I don't think the condition you added before bitResolution = newResolution is correct. Think about a setup with just one sensor just changed from resolution 12 to 10. With this condition bitResolution would stay at 12 instead of being changed to 10. This is just an example, there are more situations where the condition would produce incorrect results.

  4. Just before starting the loop I think if (devices == 1) return true; can be added because in that case we know that the only device has the resolution just copied to the global bitResolution.

Hope this opinion helps! ;-)

RobTillaart commented 4 years ago

This helps a lot! thanks, really appreciated

Ad 1. true isConnected allready reads scratchpad so the call to getResolution can be removed.

Ad 2. As this is an assumption you just never knows for sure. I think bitResolution is in fact "maxResolution set" and it adds to complexity of this lib, in semantic sense.

Ad 3. Good catch, can be fixed easy if we use the semantics that bitResolution is the maxResolution of the set of devices.

Ad 4. OK makes sense as it prevents an unneeded call to getResolution.

so proposal [4] becomes; // gave it a number for referral :)

// set resolution of a device to 9, 10, 11, or 12 bits
// if new resolution is out of range, 9 bits is used.
bool DallasTemperature::setResolution(const uint8_t* deviceAddress,
                   uint8_t newResolution, bool skipGlobalBitResolutionCalculation) {

  bool updated = false;

  // DS1820 and DS18S20 have no resolution configuration register
  if (deviceAddress[0] == DS18S20MODEL)
  {
    updated = true;
  }
  else
  {

   // handle the sensors with configuration register
    newResolution = constrain(newResolution, 9, 12);

    uint8_t newValue = 0;
    ScratchPad scratchPad;

    // we can only update the sensor if it is connected
    if (isConnected(deviceAddress, scratchPad))
    {
      switch (newResolution) 
      {
        case 12:
          newValue = TEMP_12_BIT;
          break;
        case 11:
          newValue = TEMP_11_BIT;
          break;
        case 10:
          newValue = TEMP_10_BIT;
          break;
        case 9:
        default:
          newValue = TEMP_9_BIT;
          break;
      }

      // if it needs to be updated we write the new value
      if (scratchPad[CONFIGURATION] != newValue)
      {
        scratchPad[CONFIGURATION] = newValue;
        writeScratchPad(deviceAddress, scratchPad);
      }
      // done
      updated = true;
    }
  }

  // do we need to update the max resolution used?
  if (skipGlobalBitResolutionCalculation == false)
  {
    bitResolution = 9;
    if (newResolution > bitResolution) bitResolution = newResolution;
    if (devices > 1)
    {
      for (uint8_t i = 0; i < devices; i++)
      {
        if (bitResolution == 12) break;
        DeviceAddress deviceAddr;
        getAddress(deviceAddr, i);
        uint8_t b = getResolution(deviceAddr);
        if (b > bitResolution) bitResolution = b;
      }
    }
  }

  return updated;
}

Did I miss anything?

RobTillaart commented 4 years ago

updated proposal [4] missing line - scratchPad[CONFIGURATION] = newValue;

timing of setResolution() for 1 sensor + UNO dropped from ~75ms to ~63ms when the new resolution was different. When resolution was same the timing is approx the same ~27ms.

Footprint decreased with 102 bytes for my testSketch so that is even better, but I have a few additional changes made. But definitely no increase.

gitkomodo commented 4 years ago

I'll have a more detailled look later, a quick reaction for now...

Regarding 2. Ok, I can live with it, but can 'updated' be changed to 'success' or something like that? Using 'updated' when the ScratchPad is not updated confuses me.

Regarding 3. After this statement at the start: newResolution = constrain(newResolution, 9, 12);

These lines:

bitResolution = 9;
if (newResolution > bitResolution) bitResolution = newResolution;

Are equal to: bitResolution = newResolution;

Because newResolution is always (already) 9 or greater than bitResolution.

RobTillaart commented 4 years ago

proposal [5]

// set resolution of a device to 9, 10, 11, or 12 bits
// if new resolution is out of range, it is constrained.
bool DallasTemperature::setResolution(const uint8_t* deviceAddress,
                   uint8_t newResolution, bool skipGlobalBitResolutionCalculation) {

  bool success = false;

  // DS1820 and DS18S20 have no resolution configuration register
  if (deviceAddress[0] == DS18S20MODEL)
  {
    success = true;
  }
  else
  {

   // handle the sensors with configuration register
    newResolution = constrain(newResolution, 9, 12);

    uint8_t newValue = 0;
    ScratchPad scratchPad;

    // we can only update the sensor if it is connected
    if (isConnected(deviceAddress, scratchPad))
    {
      switch (newResolution) 
      {
        case 12:
          newValue = TEMP_12_BIT;
          break;
        case 11:
          newValue = TEMP_11_BIT;
          break;
        case 10:
          newValue = TEMP_10_BIT;
          break;
        case 9:
        default:
          newValue = TEMP_9_BIT;
          break;
      }

      // if it needs to be updated we write the new value
      if (scratchPad[CONFIGURATION] != newValue)
      {
        scratchPad[CONFIGURATION] = newValue;
        writeScratchPad(deviceAddress, scratchPad);
      }
      // done
      success = true;
    }
  }

  // do we need to update the max resolution used?
  if (skipGlobalBitResolutionCalculation == false)
  {
    bitResolution = newResolution;
    if (devices > 1)
    {
      for (uint8_t i = 0; i < devices; i++)
      {
        if (bitResolution == 12) break;
        DeviceAddress deviceAddr;
        getAddress(deviceAddr, i);
        uint8_t b = getResolution(deviceAddr);
        if (b > bitResolution) bitResolution = b;
      }
    }
  }

  return success;
}
gitkomodo commented 4 years ago

I think this code should produce correct results! :thumbsup: That is, by reading the code, I didn't actually compile and test it.

One very minor point, comment should be: // if new resolution is out of range, it is constrained.

RobTillaart commented 4 years ago

comment updated in the code above.

I did compile it and ran a changed version of the timing demo. The timing showed correct working of setting the resolution. I timed the setResolution function and it got faster when it changed an is on par when the new resolution is same. Did not test the skipGLobalBit ... part again.

I now have it in my development version and will merge it into my github fork soon, probably this weekend.