rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
145 stars 34 forks source link

OTGW gives sporadically unrealistic/erratic values shown in Domoticz graphs #200

Closed 0crap closed 1 year ago

0crap commented 1 year ago

Setup: OTGW v2.8 PIC16F1847 v6.4 Wemos D1 mini v0.10.0+eeeb22c Domoticz 2022.2 - native OTGW interface using PS=1 mode Remeha Calenta and iSense thermostat

Sporadically some OTGW sensors show erratic values. Seen on the following sensors:

Some examples. The graphs shown in Domoticz can be exported to Excel to see the recorded value at specific timestamps. On 2023-02-08 09:35:00 a few sensors gave wrong values at the exact same time.

On 2023-02-07 23:25:00 the Return Water Temperature recorded 50.8 RWT

I have a bunch of other temperature and humidity sensors inside Domoticz which are connected via MQTT. On all of these sensors I receive logical values. (No impossible values that jump a large amount in a short time.)

On Discord I discussed the issue with @ otgw and it seems a lot of libraries skip the CRC check. That might be an issue with the external sensor in case of some interference. However, in my case, also "internal" sensors seem to have this issue once in a while....

Any ideas on why this happens?

rvdbreemen commented 1 year ago

@0crap am I correct that this is using the DS18x20 sensor connected to the PIC. You get the data using the TS=R command. The data is being read using the PIC firmware, that is where the CRC needs to be implemented.

What do you mean with "internal" sensor?

0crap commented 1 year ago

@0crap am I correct that this is using the DS18x20 sensor connected to the PIC. You get the data using the TS=R command. The data is being read using the PIC firmware, that is where the CRC needs to be implemented.

What do you mean with "internal" sensor?

Yes, the DS18x20 sensor is connected to the PIC as I described above. "Internal" is a bit fuzzy indeed, but what I mean with that is just all the other sensors reported by the OTGW, by looking at the OpenTherm traffic between the boiler and thermostat. External is just the one DS18x20 physically connected sensor.

The issue is that before connecting the DS18x20 sensor I did not have this many false readings. Might it be the case that, by adding the DS18x20 sensor, all readings are now affected by the injection of a little noise into the system? I mean, not only the Return Water Temp is affected, but also many other OTGW sensors.

WP

RS

I'm glad I don't have 19.5 Bar of pressure inside my boiler... 💯

0crap commented 1 year ago

Around the same time I connected the external sensor I also switched from MQTT to the native Domoticz implementation. (PS=1) I'm a bit wary if that has anything to do with it, but I guess not... I can't ignore the fact that it sometimes looks as if values are switched between each other. For example the CH Water Pressure was at that exact time the Room Setpoint. (19.5)

hvxl commented 1 year ago

Since you get multiple incorrect values at the same time, I suspect there may be some corruption occurring in the PS=1 output. This is just a list of comma separated values. If the mapping of those values to their IDs gets out of sync, you will get the value for one parameter to show up as a different one.

If you can capture the PS=1 output for the same time the values are incorrect, it should give us more clues.

0crap commented 1 year ago

Yeah, as you where writing your comment I added some text to my previous post.

I can't ignore the fact that it sometimes looks as if values are switched between each other. For example the CH Water Pressure was at that exact time the Room Setpoint. (19.5) And the Room Setpoint sample was missing in Domoticz. (Which is not obvious by looking at the graph, because it's just a horizontal line, but shows when you export the values as a csv file!)

0crap commented 1 year ago

Room Setpoint Excel RS

Bang on @hvxl !

rvdbreemen commented 1 year ago

This explains it, so what's the solution then? Should Domoticz not be able to handle this?

hvxl commented 1 year ago

The Domoticz code checks for a line with 25 or 34 comma-separated fields. I suspect that something interferes with the PS=1 output, causing the line to be chopped in two. If the resulting line happens to have exactly 25 of the 34 fields, the code will happily apply the pre-5.0 mapping to those 25 fields. Field 12 in the new mapping is MsgID 24, Room temperature. But in the old mapping, field 12 was MsgID 28, Return water temperature. Similarly, the Room Setpoint gets the value of Max Rel Modulation, CH Water Pressure gets the Room Setpoint, and Room Temperature gets the Rel Mod Level value.

So, I'd like to see exactly what the corruption looks like. Could it be the confirmation of the SC command that causes this? Domoticz seems to be issuing the PS=1 command roughly every 30 seconds. But that can shift over time. Every so often it could coincide with the SC command from the OTGW-firmware.

If this is indeed what happens, the Domoticz code may need to be made more robust. For example by checking that fields match the expected format. If not, discard the whole line.

But this theory still doesn't explain how the Return Water Temperature could become 50.8.

rvdbreemen commented 1 year ago

In dev branch now supressing sending timecommands when in PS=1 mode. @0crap would you test this please? To test, I would like to add you on the beta channel, but which user on Discord are you?

0crap commented 1 year ago

I tested beta2d2d620 which stopped sending the SC command. I also switched to the native Domoticz implementation to include the OT. (Previously I used a self made script that sends the OT command every 15 minutes, using the OTGW API.) OT

So far so good, not seen the issue for the past day. But I need more uptime to be sure. Just flashed the latest stable, released today. 0.10.1+56994f1

Log

Let's give it some uptime and see how it goes. Thx!

0crap commented 1 year ago

Today I got only 1 erratic value on the Return Water Temp. No other graphs involved this time. Because the Return Water Temp is the external DS18x20 sensor, it might very well be because of the library skipping a CRC check. Like discussed on Discord with @ otgw.

RWT

0crap commented 1 year ago

I think it's safe to say that the mixing up of values is not seen anymore on release 0.10.1 Only issue that remains is the Return Water Temperature erratic values. Caught a few more: RWT3

rvdbreemen commented 1 year ago

Could you supply logs to see if it's a firmware issue. Or an issue in your boiler.

0crap commented 1 year ago

IMHO this can't be a boiler issue, this is about the external DS18x20 sensor connected next to the PIC. Unfortunately it's practically impossible to log. Typically it's seen once or twice a week. Domoticz is running on a Raspberry Pi on SD-card. Logging every 30 seconds for one week means flooding the SD-card. :-(

rvdbreemen commented 1 year ago

Ok, I think we should close this issue now. It's not related to the firmware. It's the PIC, I know @hvxl has noted this and discussed it. This is nothing I can do to fix in my ESP firmware.

0crap commented 1 year ago

Fair enough. I hope it doesn't get forgotten, there is definitely a bug causing this.

rvdbreemen commented 1 year ago

Stay in touch with @hvxl about this. The real issue as I understood is that the memory was not enough.

Just a suggestion, but instead of using the PIC to readout the sensor. Why not switch to the ESP option. I think the library I use does a great job. Have you tried that?

hvxl commented 1 year ago

The real issue as I understood is that the memory was not enough.

No. The issue is that implementing a CRC check is a lot of work, with no more than a suspicion that it will fix the problem.

Roos-AID commented 1 year ago

I really recommend to use the ESP option to connect multiple DS18x20 sensors to ESP gpio pin. This never caused a glitch in my recordings.

0crap commented 1 year ago

Stay in touch with @hvxl about this. The real issue as I understood is that the memory was not enough.

Just a suggestion, but instead of using the PIC to readout the sensor. Why not switch to the ESP option. I think the library I use does a great job. Have you tried that?

I'm not sure how you come to the conclusion this is memory related. Nothing points into that direction.

I have not tried the ESP option, besides I need to solder again, this shows up as an "extra sensor" as I understand. Does this extra sensor shows up when using Domoticz with the PS=1 option? That is how I use the OTGW at the moment. I probably need to switch to MQTT which has it's downsides for Domoticz users.

0crap commented 1 year ago

The real issue as I understood is that the memory was not enough.

No. The issue is that implementing a CRC check is a lot of work, with no more than a suspicion that it will fix the problem.

The fact is that there is something wrong with the implementation as I currently use it. That can be clearly seen by the screenshots above. If a CRC check will solve the issue, that is a suspicion, agreed on that. Might very well be something else needed to fix it.

All I can tell is that I have a lot of sensors running in Domoticz, none of them ever gave erratic values.

0crap commented 1 year ago

I really recommend to use the ESP option to connect multiple DS18x20 sensors to ESP gpio pin. This never caused a glitch in my recordings.

Good to know, thx. Is your setup similar to mine? (Native Domoticz implementation.)

DaveDavenport commented 1 year ago

I'm not sure how you come to the conclusion this is memory related. Nothing points into that direction.

This was mentioned on discord as the reason:

In de PIC16F88 is er geen ruimte om dat toe te voegen. Maar in de PIC16F1847 zou het moeten kunnen.

DaveDavenport commented 1 year ago

Who is talking about the raspberry pi? the CRC (needs to) happen in the firmware on the PIC.

0crap commented 1 year ago

I'm not sure how you come to the conclusion this is memory related. Nothing points into that direction.

This was mentioned on discord as the reason:

In de PIC16F88 is er geen ruimte om dat toe te voegen. Maar in de PIC16F1847 zou het moeten kunnen.

Correct, that I have seen. Might very well be the case (the CRC), but could also be something else. (Memory can be anything, but yes program memory on the PIC was it about.)

DaveDavenport commented 1 year ago

Did you just delete a message? Why change the narrative? now things makes no sense anymore for people reading it.

Anyway you said in deleted message that raspberry pi had enough memory available, I replied on this.

0crap commented 1 year ago

We were both typing too fast and editing messages, that does not help.

The root cause here is not a memory related issue. The fix might be a memory issue. Like in, not enough program memory on the PIC. Hope that is clear for all.

DaveDavenport commented 1 year ago

I only edited for broken markup. ( I can screenshot history if you want).

Removing comments/changing content is very annoying, as it changes the narrative.
Adding to a comment (or using ~a strip-through~ with a remark why) is better then changing the story.

rvdbreemen commented 1 year ago

The root cause here is not a memory related issue. The fix might be a memory issue. Like in, not enough program memory on the PIC. Hope that is clear for all.

I never said it was the root cause. I just said there is a memory issue on the PIC with the 16f88 implementation. As stated by @hvxl it’s also a lot of work for him to add it to the PIC.

I really would like you to check out the ESP implementation. It works for many people without the sporadic corrupt data.

You can check the wiki how you can do that.