milesburton / Arduino-Temperature-Control-Library

Arduino Temperature Library
https://www.milesburton.com/w/index.php/Dallas_Temperature_Control_Library
969 stars 487 forks source link

sensors.requestTemperaturesByIndex(0); too slow in 3.8.0 #110

Closed VolkerBo closed 4 years ago

VolkerBo commented 6 years ago

I found an issue in V3.8.0.

With V3.7.6, the following statement needs 47ms: sensors.requestTemperaturesByIndex(0);

With V3.8.0, the statement needs 660ms. This makes my sketch very slow.

Tested on NodeMCU V2.0

Please check.

Regards Volker

RobTillaart commented 6 years ago

Was the bit depth of the measurement the same? 9 bit or 12 bit makes quite a difference.

Note that the library supports non blocking calls, it will still take time before yyou have the measurement, but youcan do a lot of other things besides waiting. Have look at the examples.

VolkerBo commented 6 years ago

The bit depth of the measurement was exactly the same, 12 bit.

I used the same sketch, once with the older version and once with the new V3.8.0., no other differences. Trust me, you have a bug in the V3.8.0.

Please check it out your self.

VolkerBo commented 6 years ago

Here the log of a test-sketch (single.ino).

Dallas Temperature IC Control Library Demo Locating devices...Found 1 devices. Parasite power is: OFF Device 0 Address: 28FF9168C2160306 Device 0 Resolution: 12 Requesting temperatures... Before: 23779 After: 23826 DONE Temp C: 21.12 Temp F: 70.03

Before means, before sensors.requestTemperaturesByIndex(0); After means, after sensors.requestTemperaturesByIndex(0); Difference 47ms.

With V3.8.0 you have here a difference of 661ms.

RobTillaart commented 6 years ago

A 12 bit measurement will normally take up to 750 millisec (according to datasheet) so the timing of 3.7.6 of <50 msec is extreme short.

I have no free sensors nearby to verify, sorry..

RobTillaart commented 6 years ago

The single code - https://github.com/milesburton/Arduino-Temperature-Control-Library/blob/master/examples/Single/Single.pde - has no timestamp before / after.

Where did you add them in the code?

VolkerBo commented 6 years ago

Single.txt

I uploaded my changed version.

yachnyymaxim commented 6 years ago

Hello!

I confirm that something has changed in requestTemperatures() timing. When I switched from v3.7.7 to v3.7.9 or v3.8.0 my system containing 4 temp sensors started to reboot by watchdog. My code requests temperatures of all 4 sensors consequently and it takes too much time now!

RobTillaart commented 6 years ago

How much time does it take for the 4 sensors? What is the resolution - per sensor ? 9, 10, 11 or 12? What is the time per sensor?

The datasheet states conversion time takes max: (from my head) 750 millisec @ 12 bit
375 ms @ 11 bit 187 ms @ 10 bit 94 ms @ 9 bit

What is the watchdog duration before reset? Note: The watchdog is not a part of the library.

A solution to prevent the reset is to use asynchronuous mode. See examples how to do this.

RobTillaart commented 6 years ago

Note: A major recent change is that the Wire library is not included anymore in the DTCLib.

yachnyymaxim commented 6 years ago

Rob, thanks for your notes.

Unfortunately I've already switched to async mode, assembled, and installed my device back to its place :) and thus I can't measure the exact time taken by requestTemperatures() call. Tomorrow I will take another device using the same sensor configuration and will do the measurements.

Regarding your other questions: the resolution is 9 bit (default value), platform used is ESP8266.

yachnyymaxim commented 6 years ago

Here are the timings of the 9 bit requestTemperatures() call: Sensor 1 time: 655 Sensor 2 time: 621 Sensor 3 time: 607 Sensor 4 time: 601

It looks like 9 bit resolution is not set and sensor is using 12 bit resolution mode.

RobTillaart commented 6 years ago

Agree, those timings look definitely like 12 bit resolution. Does your code set the resolution explicitly or does it use default?

A first look shows that the introduction of the flag bool skipGlobalBitResolutionCalculation = false); might be the most relevant change (+ related code of course).

When time permits I'll have a deeper look into it.

milesburton commented 6 years ago

The timing could be affected by the removable of the bit blast code which was introduced a few versions back. We removed it as it was impacting compatibility with other MCUs, and problems with the upstream onewire library.

yachnyymaxim commented 6 years ago

Does your code set the resolution explicitly or does it use default?

I tried both using default value and setting 9 bit resolution explicitly - both options result in the same >600ms conversion time.

RobTillaart commented 6 years ago

I have written a minimalistic DS18 library that supports only one DS18 per pin. Code calls are similar to this lib. Can you check timing with it too (with the ESP32)? https://github.com/RobTillaart/Arduino/tree/master/libraries/DS18B20

yachnyymaxim commented 6 years ago

Hi Rob.

I've tested the DS18B20 on ESP8266 - it gives the same 600+ ms from requestTemperatures() call to isConversionComplete() to become true.

milesburton commented 6 years ago

Assuming this issue is closed. Ping me if not

graemeftv commented 5 years ago

Miles

I am using version 3.7.7 of your library

i have 3 dallas temp sensors, running on a mega2560

when i call the function sensors.requestTemperaturesByAddress(my_sensor_name_here) and then mysensorvalue = sensors.getTempC(my_sensor_name_here);

its takes 600 millseconds and the code appears to be blocking and no other functions can run during this time

where very little delay with non blocking code if i use the by index function::

   sensors.requestTemperatures();  // request temp from dallas chip

  mysensornamehere = (sensors.getTempCByIndex(0));// transfer value to tempC

in summary i am not looking in detail how how long the conversion time is, just which function is blocking other code from running

milesburton commented 5 years ago

Thanks for the report @graemeftv. I don't have any ICs here to validate your observation, this will make two reports now so I imagine we do indeed have an issue - whether upstream with OneWire or in our library (fyi, there's many contributors to this library and is far from 'mine')

ramstyler commented 5 years ago

Any more updates on this issue? Since I updated to 3.8.0, I have found the same issue others have reported. The conversion time is about the same, and that is OK. The problem is requestTemperatures() is blocking timer interrupts from occurring, which is having a bad flow on effect for the whole program. Revert back to 3.7.6 and no issues. Using ESP8266.

milesburton commented 5 years ago

Sorry for the delay, I need to grab an IC to test this issue out. Can you open a PR for the time being?

gitkomodo commented 4 years ago

I saw some strange timing in my sensors that could be related to the behaviour mentioned in this issue. Let me start by saying these are cheap ones that look the same but are more likely to be 'DS18B20 compatible' than original products from Dallas/Maxim. So the strange behaviour can either be related to this type of sensor or to the library.

After updating RobTillaart's timing example (see https://github.com/gitkomodo/Arduino-Temperature-Control-Library/blob/master/examples/Timing/Timing.ino) I decided to dig a little deeper and measure the duration of every run of requestTemperatures(). I saw most runs at resolutions 9-11 were timed at a little more than the max time for that resolution in the datasheet and some were a lot shorter. It took me a while to notice the repetitive pattern, which seems quite obvious from the snippets below.

Resolution 10
Time: 190 Temp: 18.7500
Time: 190 Temp: 18.7500
Time: 13 Temp: 18.7500
Time: 190 Temp: 18.7500
Time: 190 Temp: 18.7500
Time: 12 Temp: 18.7500

Resolution 11
Time: 377 Temp: 18.7500
Time: 62 Temp: 18.8750
Time: 377 Temp: 18.8750
Time: 63 Temp: 18.8750
Time: 377 Temp: 18.8750
Time: 62 Temp: 18.8750

In the meantime I also studied the changes between library 3.7.6 and 3.8.0 as mentioned in the comments above and found that an important relevant change is the replacement of isConversionAvailable by isConversionComplete. The later actually checks if the line has been raised indication the conversion is complete as defined in the data sheet, so is the right check for completion of the temperature conversion.

Because the timing patterns I saw suggested the conversion was assumed to be done most of the time after millisToWaitForConversion was exceeded, I decided to edit this function to always return 1000 milliseconds. Some timing results are below.

Resolution 9
Time: 486 Temp: 19.0000
Time: 485 Temp: 19.0000
Time: 485 Temp: 19.0000

Resolution 10
Time: 486 Temp: 19.0000
Time: 484 Temp: 19.0000
Time: 485 Temp: 19.0000

Resolution 11
Time: 486 Temp: 19.1250
Time: 485 Temp: 19.1250
Time: 485 Temp: 19.1250

Resolution 12
Time: 485 Temp: 19.1875
Time: 485 Temp: 19.1875
Time: 486 Temp: 19.1875

My conclusion based on these results is that my sensor is always taking the time to do a 'full' conversion, regardless of the resolution (the set resolutions seems to be on the ScratchPad). Then the converted temperature is 'cropped' to the configured resolution when it's written to the ScratchPad (as far as I can see the library doesn't do this, or does it?). When the library stops waiting before the temperature conversion is done, the conversion keeps going and 'reports' it's done during a later conversion request by the library. This leeds to the patterns I mentioned earlier.

This conclusion means my sensors are at the root of the problem, and others reporting in this issue may have sensors with faulty behaviour too. For my sensors it is useless to use other resolutions than 12-bits, so I will avoid that in the future. This also means the library is not causing the problem, so doesn't really need to be changed.

That being said, I'm wondering if the library should stop waiting after millisToWaitForConversion is exceeded, or only stop waiting when the line has been raised to indicate that conversion is complete. With correct behaviour millisToWaitForConversion would never be exceeded, so shouldn't have to be checked. Removing this condition would prevent strange/faulty results with my sensors and might save a lot of other people with cheap sensors some problems.

I propose to always use millisToWaitForConversion of 750ms regardless of the resolution. This change doesn't affect good sensors, makes behaviour for some (a lot?) cheap sensors better and prevents infinite blocking by still having a maximum time to wait.

RobTillaart commented 4 years ago

I propose to always use millisToWaitForConversion of 750ms regardless of the resolution. This change doesn't affect good sensors, makes behaviour for some (a lot?) cheap sensors better and prevents infinite blocking by still having a maximum time to wait.

Your analysis looks good, and I support your proposal.

Think millisToWaitForConversion() should not be altered as it is a public function, so always returning 750ms would possibly break the behavior of existing programs and it would not match the specification.

Internally in the library millisToWaitForConversion() is only used in blockTillConversionComplete(), So in the latter the value of 750ms could be hard coded to have a time out that works for all sensors when waiting for ConversionComplete signal.

As I do not have enough experience with parasite mode, I cannot estimate the impact on it. My assumption is that one of the reasons parasite mode is used is to conserve energy. By keeping the line always HIGH for 750ms seems in conflict with that.

The above would imply the following code for blockTillConversionComplete() that makes non parasite mode more robust but does not change parasite mode behavior.

void DallasTemperature::blockTillConversionComplete(uint8_t bitResolution) {

    unsigned long delms= 750;
    if (checkForConversion && !parasite) {
        unsigned long start = millis();
        while (!isConversionComplete() && (millis() - start < delms))
            yield();
    } else {
        delms = millisToWaitForConversion(bitResolution);
        activateExternalPullup();
        delay(delms);
        deactivateExternalPullup();
    }
}

Minor: timeOut might be a better name for delms.

gitkomodo commented 4 years ago

You're right about not changing millisToWaitForConversion() and not changing parasite mode behavior. The code looks good to me. I'm unsure if the following change to the declaration of delms would make a difference in size/speed:

void DallasTemperature::blockTillConversionComplete(uint8_t bitResolution) {

    if (checkForConversion && !parasite) {
        unsigned long start = millis();
        while (!isConversionComplete() && (millis() - start < 750))
            yield();
    } else {
        unsigned long delms = millisToWaitForConversion(bitResolution);
        activateExternalPullup();
        delay(delms);
        deactivateExternalPullup();
    }
}
RobTillaart commented 4 years ago

I expect no diff in size or speed as I expect the compiler will probably optimize delms away. 750 is a magic number and it could be more understandable using a #define or a const

#define MAX_CONVERSION_TIMEOUT 750

void DallasTemperature::blockTillConversionComplete(uint8_t bitResolution) {

    if (checkForConversion && !parasite) {
        unsigned long start = millis();
        while (!isConversionComplete() && (millis() - start < MAX_CONVERSION_TIMEOUT ))
            yield();
    } else {
        unsigned long delms = millisToWaitForConversion(bitResolution);
        activateExternalPullup();
        delay(delms);
        deactivateExternalPullup();
    }
}
gitkomodo commented 4 years ago

The behavior of my 'fake' sensors is explained in detail in this article at https://github.com/cpetrich/counterfeit_DS18B20. These sensors of family D2 take "approx. 462-523 ms for conversion regardless of measurement resolution".

Now the first post in this issue turns out to be probably unrelated to this behavior. The article relates the posted sensor address to family B2. Those sensors take "approx. 587-697 ms for a 12-bit temperature conversion and proportionally less at lower resolution". Because of this proportionality they shouldn't run into the same problem as mine do.

So what is the initial problem? The important change between version 3.7.6 and 3.8.0 is that waiting for isConversionAvailable to become true was replaced by waiting for isConversionComplete to become true.

3.7.6 just reads the ScratchPad and check if the first byte is not zero. This is almost always true because until a temperature conversion is complete this byte holds (part of) the previous converted temperature.

bool DallasTemperature::isConversionAvailable(const uint8_t* deviceAddress){

    // Check if the clock has been raised indicating the conversion is complete
    ScratchPad scratchPad;
    readScratchPad(deviceAddress, scratchPad);
    return scratchPad[0];
}

3.8.0 issues a read slot on the 1-wire according to specification.

bool DallasTemperature::isConversionComplete() {
    uint8_t b = _wire->read_bit();
    return (b == 1);
}

So 3.7.6 had a serious problem here as it just stopped 'blocking till conversion complete' when a previous conversion was read from the scratchpad. It's essentially the same as requesting a temperature conversion in async mode and directly reading the scratchpad for a temperature. This was discussed in detail in issue #52.

I think this issue can be closed when PR #150 with changes discussed above is merged.

RobTillaart commented 4 years ago

Tested the example timing.ino morning with an UNO. This uses

t = sensor.getTempCByIndex(0);

Results UNO

DallasTemperature Library version: 3.8.1

Test takes about 30 seconds for 4 resolutions
RES TIME    ACTUAL  GAIN
68460
68008
9   94.00   104.25  110.9%
139652
68012
10  188.00  179.45  95.5%
139660
68008
11  375.00  329.85  88.0%
139640
68008
12  750.00  630.55  84.1%

Results ESP32

DallasTemperature Library version: 3.8.1

Test takes about 30 seconds for 4 resolutions
RES TIME    ACTUAL  GAIN
141966
70854
9   94.00   105.80  112.6%
142781
70871
10  188.00  181.35  96.5%
142271
70871
11  375.00  332.55  88.7%
142406
70871
12  750.00  634.80  84.6%

Conclusions

@milesburton Issue can be closed.