technion / lol_dht22

A Raspberry Pi DHT22/AM2302 polling application
81 stars 64 forks source link

Data not good (again?) #15

Open bobbie12 opened 5 years ago

bobbie12 commented 5 years ago

Hi there, everything ran fine at least two years up till yesterday, but upgrading the kernel to 4.14.72-1 (Arch Linux) got me also "Data not good" even if the 2 microseconds fix is in the code. I cloned wiringPi and lol_dht22 but didn't got better. Hardware: Raspberry 1B. I'll try another sd card to copy a backup to see if the error goes away.

Update: I verified that running the pi1b with kernel 4.14.66-1 (from my backup) lol_dht22 is working as before. So this is no hardware failure. Now with a second check on kernel 4.14.72-1, I now get (after a couple of "Data not good") always Humidity = 0.00 % Temperature = 0.00 *C Has anyone similar problems with this kernel version?

bobbie12 commented 5 years ago

Issue persists: I tried kernel 4.14.78-2 with the same result. Has there been any change in the kernel structures or timing since around 4.14.72? Anyone out there with a working lol_dht22 version?

zirkleta commented 5 years ago

"Data not good, skip" after updating to 4.14.79. Tried changing the timing from 1ms to 2ms as mentioned in other forums as a fix on previous versions with no success.

bobbie12 commented 5 years ago

As this bug is bothering me, I volunteer to run tests and provide logfiles. Can some developer give me hints what to check? I can read and program at least in C ;)

technion commented 5 years ago

@bobbie12 I've been informed in a number of places that there's some sort of fix discussed on a forum, but I've never found the relevant thread or been shown a patch.

My difficulty is I'm not a kernel developer, I can only see code used to work, and some people (but apparently not all) find it doesn't after an upgrade. Meanwhile, the hardware in question doesn't produce any logs or give us anything to debug. It's been frustrating to fix. If you can dig up a patch that someone has recommended we can certainly discuss whether it helps.

technion commented 5 years ago

@bobbie12 @zirkleta

I've finally got my hands on another RPi, with a whole new Raspbian and kernel 4.14.79. I've pushed a series of changes to deal with autoconf warnings, and more over, I've updated timing based on this source:

https://github.com/adafruit/Adafruit_Python_DHT/blob/master/source/Raspberry_Pi_2/pi_2_dht_read.c

They feel a lot better to me, but as always, it isn't 100% reliable, and I'm as bothered as anyone by how unclear it is to get the exact code that makes this work.

I can say that on my machine, I ran this script 20 times and got valid output first try 9 times, and second try 9 more.

gmanic commented 5 years ago

I ran into the same trouble, many unusable values.

Modifying the code to find out more on this revealed that it is really a couple timing issues.

  1. At startup (first run) - obviously the code and all its loops are not fully cached (eg sizecvt) or interfered by thread/task/context changes. Missing only the first transition at the very beginning means data is unusable (as the first four transitions are always ignored).
  2. To have as clean values as possible, I changed the logic to first only collect the "time" between each transition H->L and L->H as uint16_t in an array [85]. Then only after all values have been captured, I had them decode in a second step (which is not time critical). That showed me, that on my Pi 1 B "delayMicrosecond(1)" yields much more than 1 µs (there's only max. 700 instructions every µs on a Pi 1 B). Without delayMs I get "time"-values (within this program: plainly counter++ only interupted by the GPIO-reads and sizecvt) of ~138-145 (for the 26-28µs as of the datasheet), ~294-298 (for 50µs, with spikes up to 400 and down to 150) and ~370-400 (for 75µs with spikes up to 700). With delayMs(1) the values are 11-12 (for 26-28µs), 23-24 (for 50µs, spikes 13-15 and 29) and 32-33 (for 75µs).

Result is, that the timing is essential and depending on the hardware you use (obviously as well as the kernel) and especially current cpu load). Solution could be to have some statistical analysis after collection of all values to find out, which transition is represents what (quite expensive) or to explicitly restrict on L->H transitions (to only time the H values).

I'll adjust the code based on my observations to have as good results as possible for my hw + sw. Hopefully I'll get a fork setup with my changes shortly.

gmanic commented 5 years ago

Forked to https://github.com/gmanic/lol_dht22