kizniche / Mycodo

An environmental monitoring and regulation system
http://kylegabriel.com/projects/
GNU General Public License v3.0
2.95k stars 494 forks source link

DS18B20 error handling #404

Closed zsole2 closed 6 years ago

zsole2 commented 6 years ago

Mycodo Issue Report:

Problem Description

I have two of these sensors installed now. In the past, I had a few outlier readings at 85 C. But yesterday, it started to read constantly 85 C, I had to stop everything, rebooting did not help. After a few hours rest, it now again seems normal, but in the meantime, I did some research.

The 85 C reading of the DS18 is an error message. Accordingly, it would be nice if Mycodo handled it accordingly. As you can imagine, it can really screw up the temperature graphs if there is a 85C outlier. Or in a worse case, my refrigerator was freezing over (constant cooling down because of the continuous 85C reading).

On the other hand, there are reports that the Pi Zero can have problems reading the DS18 with a frequency that is OK for other RasPi's. See here (and at the linked github there).

I did a few readings to time them, here is an example:

pi@chz-112:~ $ time cat  /sys/bus/w1/devices/28-0417b1fb82ff/w1_slave
e1 00 4b 46 7f ff 0c 10 f8 : crc=f8 YES
e1 00 4b 46 7f ff 0c 10 f8 t=14062

real    0m1,898s
user    0m0,000s
sys 0m0,060s
pi@chz-112:~ $ time cat  /sys/bus/w1/devices/28-00000586475d/w1_slave 
ee 00 4b 46 7f ff 02 10 de : crc=de YES
ee 00 4b 46 7f ff 02 10 de t=14875

real    0m1,879s
user    0m0,000s
sys 0m0,060s

Usually these are in the 0,9 sec range, though.

Looking further into this issue more, it appears the the 85C reading is inherently "handled by the w1_therm module" see the link above. So right now I'll wait until the erroneous readings appear again to be able to learn more about this problem.

Conclusion

Checking for the value of 85C and flagging it as an unsuccessful reading would be an improvement.

kizniche commented 6 years ago

Strange. I'll add your recommendation. Let me know if you find out anything else that may improve the sensor code. Thanks.

zsole2 commented 6 years ago

It happened again, got the 85C after updating to 5.5.21 with the fix. Actually, I did not check for a few hours, and my fridge froze over nicely, got down to -11 Celsius: image The sensor first got a couple of bad readings, then it completely gone mad. True, this is my old one from the shelf, so I better get new ones for these test. But without it, we would never get into this situation, right?

So anyway, I thought the handling the 85C error would result that the wrong temperature never gets to the database, and used for calculating the PID.

kizniche commented 6 years ago

Sorry about that. I used the wrong operator. Check the commit, above, for the small edit I made, if you want to test it before the release.

zsole2 commented 6 years ago

I finally started to test my DS18 sensors, now running 4 new ones in parallel. After a few days, I collected almost 25000 measurements for each. None had the error value, which is good.

One of them threw the error: 2018-03-25 22:30:45,917 - mycodo.input_7 - ERROR - StopIteration raised. Possibly could not read input. Ensure it's connected properly and detected. 20 times in the 4 days, which can also be OK, but considering not using this sensor for the time being (too bad that this is the one the has exact same reading as another one).

Now for the issue: out of about 100000 measurements, I got 4 that are outliers. Again, these are not the error value (85 Celsius), but quasi random data: 1045.38; 532.56; 1046.88 and 533.44 degrees Celsius. Unfortunately I did not find anything with Internet search for these anomalies.

It would be good to avoid these values from the charts, too. Since the temperature range for these sensors is given in the spec sheet (–55°C to +125°C), readings outside of those values could be eliminated.

Secondarily, a line in the log would be useful to see when a reading was dropped, and why (whether error or outlier value).

kizniche commented 6 years ago

I just pushed a fix for preventing measurements outside the expected range.

On another note that I didn't address previously, why would 85 be chosen as the returned temperature for an error if it's within the expected range? It seems as though this would inevitably cause actual 85C measurements to be thrown out.

zsole2 commented 6 years ago

Thanks for the fix. I remember reading some explanation for the error code, and some people making the same comment as yourself, but I don't remember any good reply on it. In fact, it is in the datasheet, and as far as I understand it is there based on some bit-level trickeration. Anyhow, apparently the precision of the sensor gets very bad above 85C.

kizniche commented 6 years ago

Perhaps 85 is not a value that can be returned from the sensor (that is, maybe the precision is something like 84.98 then 85.02).

zsole2 commented 6 years ago

I have now upgraded to the latest commit, and the log is full with these:

2018-03-30 18:44:57,284 - mycodo.inputs.ds18b20 - ERROR - DS18B20Sensor raised an exception when taking a reading: unorderable types: int() > NoneType()
Traceback (most recent call last):
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 84, in read
    self._temperature = self.get_measurement()
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 69, in get_measurement
    elif -55 > temperature > 125:
TypeError: unorderable types: int() > NoneType()
2018-03-30 18:45:01,061 - mycodo.inputs.ds18b20 - ERROR - DS18B20Sensor raised an exception when taking a reading: unorderable types: int() > NoneType()
Traceback (most recent call last):
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 84, in read
    self._temperature = self.get_measurement()
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 69, in get_measurement
    elif -55 > temperature > 125:
TypeError: unorderable types: int() > NoneType()
2018-03-30 18:45:12,264 - mycodo.inputs.ds18b20 - ERROR - DS18B20Sensor raised an exception when taking a reading: unorderable types: int() > NoneType()
Traceback (most recent call last):
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 84, in read
    self._temperature = self.get_measurement()
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 69, in get_measurement
    elif -55 > temperature > 125:
TypeError: unorderable types: int() > NoneType()
2018-03-30 18:45:27,250 - mycodo.inputs.ds18b20 - ERROR - DS18B20Sensor raised an exception when taking a reading: unorderable types: int() > NoneType()
Traceback (most recent call last):
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 84, in read
    self._temperature = self.get_measurement()
  File "/var/mycodo-root/mycodo/inputs/ds18b20.py", line 69, in get_measurement
    elif -55 > temperature > 125:
TypeError: unorderable types: int() > NoneType()
2018-03-30 18:45:27,258 - mycodo.input_7 - ERROR - StopIteration raised. Possibly could not read input. Ensure it's connected properly and detected.
zsole2 commented 6 years ago

That was quick and easy... What I think now is that the this sensor should be thrown away. The other three never missed a reading, and the outliers disappeared since the first few days. And this 'bad' sensor gave two outliers in the last day, and both were within the normal range (-0.12 and 56.31 Celsius), which would be obviously hard to screen out.

kizniche commented 6 years ago

Yeah, the DS18B20 is so cheap it's probably best to buy extras, with the expectation some will be inaccurate or just DOA. I think we've developed the Input module as far as we can to reduce errors.

zsole2 commented 6 years ago

Agreed. I am now trying to use the Average (Single) function to smooth over the smaller outliers. A few questions with this: Why is it needed to specify the Measurement/unit? It is theoretically the same as the original input. Shouldn't the number of points be a parameter from which the average is calculated?

kizniche commented 6 years ago

Ah, this is explained in the Math section in the manual. The Max Age (seconds) determines how many measurements are averaged. If you have a 30 second Period for your Input and have a 65 second Max Age for your Average (single), then it will average the past two measurements (assuming there were no issues acquiring measurements).

kizniche commented 6 years ago

If it were merely number of points, there is the potential for the Input to stop working yet the Math would continue to return the same average. To avoid this, the Max Age and a Number of Measurements would need to be used. However, this is redundant, as only Max Age is needed to know how many previous measurements to add. Do you have any specific use cases or see any issues with this?

drgrumpy commented 6 years ago

Usually I find +85 readings occur with power/cabling issues or trying to take readings too frequently for the precision that is set, the other common error value is 127 which again I think can be due to power/connection issues.

kizniche commented 6 years ago

I'll check out the datasheet for the 127 error and add it to the module if I can find it.

kizniche commented 6 years ago

@drgrumpy could you please provide a reference for the 127 error you mentioned?

drgrumpy commented 6 years ago

hmm... I am not sure there is anything absolutely definitive.... but it often crops up in arduino fora...(I have had around 75 of them measuring soil temps in carrot fields for the last couple of winters) ... here is an example: "According to DallasTemperature.h : https://github.com/milesburton/Arduino-Temperature-Control-Library/blob/master/DallasTemperature.h the value of -127 means : DEVICE_DISCONNECTED_C"

Also they are much more reliable if you use a ds2482 to drive the 1-wire bus... (driver in the kernel that needs loading and no need for w1-gpio)

drgrumpy commented 6 years ago

Also I think you are correct about the reason for 85 and precision, if my calcs are correct with 12 bit precision (the max) you can have values of 84.992 and 85.024

zsole2 commented 6 years ago

Thanks for the info on the Period / Max Age interrelation. Another question came to my mind: why do we have to deactivate inputs before changing parameters? The PID controllers work fine without it.

kizniche commented 6 years ago

There are typically initialization steps involved with sensor modules that I don't have a standard way of invoking. There would also need to be a similar "pause" step added to all modules that ensure it's done at a safe spot. Basically, it's a lot of work.

kizniche commented 6 years ago

The new PID behavior is because I went back and refactored the code to do that. I figured it added a lot to the PID to be able to change values while it's running. I didn't see that much value to do this with Inputs (and there are many Input modules, whereas there is only one PID module to edit).

zsole2 commented 6 years ago

Thanks for the explanation. Makes sense.

zsole2 commented 6 years ago

It appears that the logging of outliers is not working. I had a value of -1744 Celsius (not with the bad looking sensor), but the log has no indication of this event, but rather showing the measurement on the graph. The log only contains the stopiteration errors with the bad sensor.

kizniche commented 6 years ago

I used some bad logic. It should be fixed now.

Theoi-Meteoroi commented 6 years ago

Most modern I2C sensors seem to give accurate readings almost immediately, you may have to toss the first sample. The MH-Z CO2 sensors always give garbage readings at first after power-up. A good feature would be a “commit hold-off” or some such to keep the initial bad data from skewing the average.

For the MH-Z modules, the bad data seems to only happen after power cycling. I only found it a nuisance in the graphs.

I haven’t used the DS18B20 much mostly because of noise issues. The low cost doesn’t make up for the aggrivation in practical use.

On Apr 3, 2018, at 9:01 AM, Kyle Gabriel notifications@github.com wrote:

There are typically initialization steps involved with sensor modules that I don't have a standard way of invoking. There would also need to be a similar "pause" step added to all modules that ensure it's done at a safe spot. Basically, it's a lot of work.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kizniche/Mycodo/issues/404#issuecomment-378302587, or mute the thread https://github.com/notifications/unsubscribe-auth/APsXKvotldRqdEthlnuMbDCmfZl1xz02ks5tk5zDgaJpZM4SDLPp.