hoylabs / OpenDTU-OnBattery

Software for ESP32 to talk to Hoymiles/TSUN/Solenso Inverters, VE.Direct devices, battery management systems, and related peripherals

GNU General Public License v2.0

308 stars 66 forks source link

Missing MQTT 0A message for battery after inverter restart #843

Closed markusdd closed 7 months ago

markusdd commented 7 months ago

What happened?

I have inverter restart set at 2300 to reset day consumption as at that time battery is always emptied to 20% (my limit) and solar is off.

With that, shortly after restart, the inverter starts pulling battery current until the limit brings it back down.

But there is no MQTT message sent again for the battery current to be zero, so I always get these very ugly interpolations and the data is also wrong that way, because -3.5A was obviously not held for hours until morning, but merely for a few seconds.

grafik

To Reproduce Bug

restart inverter at specific time when battery is idle, then observe battery current spike on restart, find there is not new MQTT message to indicate battery current is back at 0

Expected Behavior

make sur that when battery current returns to 0 it is properly reported in every case

Install Method

Pre-Compiled binary from GitHub

What git-hash/version of OpenDTU?

https://github.com/helgeerbe/OpenDTU-OnBattery/releases/tag/2024.03.07

Relevant log/trace output

No response

Anything else?

Setup is one victron MPPT 150/35 via ve.direct and Pylontech US5000 via CAN

schlimmchen commented 7 months ago

As a workaround, you could set a persistent limit in the inverter, so it starts with a low power limit.

How did you make sure that the MQTT message was not sent, rather than not received by whatever entity create the graph?

markusdd commented 7 months ago

very hard to 'make sure', but as literally every other message is consistently received and this behavior is also consistent over several days, I think it if fair to say the message is simply not being produced.

And rather than a workaround I think we should understand why the message is not being produced because it is an actual state change in the current, it should be picked up.

markusdd commented 7 months ago

I just noticed the problem is actually worse: grafik

If you plot the points, you can see that a return to 0 is actually never consistently reported for the battery current. So when the discharge phase ends after sunset and the inverter is limited to basically 0, there is not data point reported, so now it looks like the discharge actually continued through the night, which obviously isn't true.

schlimmchen commented 7 months ago

And rather than a workaround I think we should understand why the message is not being produced because it is an actual state change in the current, it should be picked up.

What makes you think that I propose the workaround as a fix? I agree with you, we should find the root cause. However, I am not convinced that it is OpenDTU-OnBattery's fault until we actually found the root cause.

What is the exact MQTT topic in question that produced the graphs you shared? And is there any processing done on the data before it is plotted?

very hard to 'make sure', but as literally every other message is consistently received and this behavior is also consistent over several days, I think it if fair to say the message is simply not being produced.

I beg to disagree. If 10% of data points during the day are missing, you wouldn't even notice, would you? At least not in the graphs. So it might not be about the value zero or the data producer.

So again: Can we rule out connection issues, Wi-Fi issues?

markusdd commented 7 months ago

I did not imply you said a workaround is a fix, I just said I'm not interested in implementing such a workaorund. WIll not fiddle with hardware to circumvent a soiftware issue.

The topic is the pylontech battery amperage, data is plotted 'raw', no processing at all.

Last point: I disagree. There are no connection problems and it would be unexplicable that just 0 amperage data from one topic is missing. The reporting interval is consistent and I have no datapoints missing for everything that really gets reported every 5 seconds.

Just lok at the graph dot and you will see that return to 0 does not seem to be reported at all. Everythign that looks like 0 are in reality small deviations (sometimes milliamps) around zero that usually occur during the day during load regulation/zero feed.

markusdd commented 7 months ago

2024-04-04-16-15 Chronograf Data.csv

Also, here is a csv data dump from influxdb, the times where there is no data indicate there are other MQTT data points being received, but nothing for that topic. There is not a single clean 0 as far as I can see.

grafik

Here it is very very visible. When the evening discharge period ends no return to 0 is reported (violet). same for the restart at 2300 (green). The next data is only sent for that topic at morning wakeup, which then suggests we had -4A the whole night, which obviously is bogus.

schlimmchen commented 7 months ago

The topic is the pylontech battery amperage

And what topic is that? "battery/current"?

Do you see (slightly) changing voltage readings while you expect the current to be reported as zero? The Pylontech reports the current alongside the voltage in the same message, and both are then transported to BatteryStats.

Maybe String(_current) is something weird when close to zero? Is it the empty string when current is exactly zero ((float)0.0)? That I can test later.

markusdd commented 7 months ago

Interesting behavior.

We have a voltage reading for every current reading, but not a current reading for every voltage reading. And there are still timestamps with other topic data that show no voltage or current. 2024-04-04-19-49 Chronograf Data.csv

grafik

So we have multiple volatge readings during the night but no update for current.

You could be correct, maybe there is a type converion issue that leads to the payload being discarded because it cannot be interpreted as float?

schlimmchen commented 7 months ago

It's not like String(static_cast<float>(0)) would translate to the empty string. I see 0.00 in my MQTT explorer for a test-topic I made the ESP32 publish to.

I am confused... The Pylontech seems to report the current in 100mA, i.e., each count of the integer communicated over the CAN protocol is worth 100mA. The integer value is multiplied by 0.1 to get Amperes, but as a floating point value. I am really no expert on floats, but AFAIK there may be fractions that cannot be expressed precisely with floats. However, that does not explain why you see 0.147499999999 and not even why you see 0.16, as 0.16 is definitely not the nearest approximation of 0.1 or 0.2, but only those should be valid values that can ever be reported by the Pylontech.

Same (similar) argument goes for the voltage reading, except that it provides two decimal places, but only two. 49.16780487804881 is certainly not an issue with floating point precision. It should read either 49.16 or 49.17 as the value.

Where do those extra decimal places come from?

I can't reproduce this at all. Not even when crafting a dummy buffer and using the function of the Pylontech CAN receiver class.

How did you create those CSV files? I see the title row reads "mean_value". What mean? Of what data? So this is somehow processed and I wasted my time...

I really see no reason why the OpenDTU-OnBattery would not publish a newly received current value. If a new value is received from the Pylontec, it is decoded correctly, and if the current is actually closer to 0 than to -0.1 and 0.1 Amps such that a real integer 0 is transmitted, it is converted as expected and published as "0.00".

As far as I can tell, your bug report is invalid, or at least not reproducible. I still suspect something to be wrong at "your end", meaning your MQTT broker or data processor or something. Please dig deeper. Look at the values published to your broker using the MQTT Explorer application.

@MalteSchm Do you care to have a look at this? Did you ever experience similar issues (AFAIK you use a Pylontech and wrote the code initially?!

spcqike commented 7 months ago

I see the title row reads "mean_value". What mean?

I hadn’t had a look at the xlsx yet, but the screenshot shows, that the values come in an interval of 12 minutes. So as he uses influx, and I guess his mqtt interval isn’t 12 minutes, the reported mean value is the average of all ~~reported~~ stored values in this specific 12 minute time frame. At least that’s what my influx gives me, if I request data from grafana for a big range (like a week or so) as grafana by default shows like 1440 values in your interval/range (however you can change this setting and get the real raw data, even for a year range.)

edit: You can see this auto averaging of grafana in his screenshots. Today at 14:00 he draw about 4A (or charged?). But in the second screenshot where you can see the last few days, this spike is only about 3-3.5A. Because with an increased range the grafana internal interval also goes up. And it than queries the mean(value) grouped by interval. So that the client does not need to process a week worth of data in a 5 second interval but only like a few thousand values.

markusdd commented 7 months ago

I see the title row reads "mean_value". What mean?

I hadn’t had a look at the xlsx yet, but the screenshot shows, that the values come in an interval of 12 minutes. So as he uses influx, and I guess his interval isn’t 12 minutes, the reported mean value is the average in this specific 12 minute time frame. At least that’s what my influx gives me, if I request data from grafana for a big range (like a week or so) as grafana by default shows like 1440 values in your interval/range (however you can change this setting and get the real raw data, even for a year range.)

exactly, this is how influx works as a timeseries database. but good catch on the interval, somehow in the explorer it defaultet to 12m, pretty non-sensical. when I adjust to 5s, which is my interval. Sorry for that confusion.

Still, the original issue holds. Before the inverter reset there is a plain 0 missing, leading to this weird interpolation: grafik

And this is consistent across days, so I'm pretty sure it's actually not there. Here is another csv with the 5s interval:

2024-04-04-22-31 Chronograf Data.csv

grafik

You can clearly see the current spike down is reported and a resetting zero, but before there is no 0, leading to a weird interpolation.

Another issue could be that these are somehow off-interval and get lost this way, not sure.

It's certainly weird, as I have not seen anythign like that in any other entity/time series

spcqike commented 7 months ago

How to you process / store the data? Do you process continuous values even if there is no change?

As the way from MQTT into your InfluxDB is not clear yet.

Anyway you can tell grafana how to handle missing values and how to display such inconsistent data. Default is a simple „connect every point“ like you see here. But you can also say „hold the last value until a new value arrives“, which is what you would like to have.

Edit https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/time-series/#line-interpolation

Its line-interpolation, step-after.

Edit2

We have a voltage reading for every current reading, but not a current reading for every voltage reading.

given your lastest export, you indeed have current readings without voltages. So do you only store (or does opendtu only send) changed values?

markusdd commented 7 months ago

normally, when the time series are clean, I just use fill(null) in my dashboards. To work around the missing 0 here I just tried fill(0) in Grafana and that makes the graph look correct. Nevertheless it would of course be better is the 0 were there in the raw data.

fill(null): grafik

fill(0): grafik

So that works for the purposes of visualization but it I do not have to do this for any of the other entities. Even the SoC you see displayed here works on the basis of fill(null) and is correct. I do the same for the victron volatge etc in other graphs and none of these has these issues of the pylontech current.

The way from MQTT to HA to Influx ist standard: I just use the grafana and Influx plugins of home assistant and I just tell HA to dump certain entities straight into the influxdb. No forther processing done, it is stored as it is received. Here is my config snippet from HA:

influxdb:
  host: 127.0.0.1
  port: 8086
  database: homeassistant
  username: homeassistant
  password: XXXXXXXXXXX
  max_retries: 3
  default_measurement: units
  precision: ms
  tags:
    source: HA
  tags_attributes:
    - friendly_name
  include:
    domains:
      - sun
    entity_globs:
      - "*.hm800_*"
      - "*.pylontech_us5000_01*"
      - "*.victron*"
    #      - "*.hichi_gth_sml_*"
    entities:
      - weather.home
      - sensor.hichi_gth_sml_total_in
      - sensor.hichi_gth_sml_total_out
      - sensor.hichi_gth_sml_power_curr

spcqike commented 7 months ago

Im not sure that fill(0) is needed. As the line-interpolation should handle the missing values.

Anyway can you use fill(0) on your SoC, too? I would like to know or see whether or not the graph would still look the same.

If you don’t have consistent readings (like 50.1% isn’t stored twice as 0A obvious isn’t) you should see a different diagram. Without fill(0) in your SoC chart, it will connect to the last value which than , of course, gives a smooth chart even if a lot of time points have no value at all.

markusdd commented 7 months ago

Don't get your point. I just posted how it looks with interpolation. It's plain wrong then , because it look like there was large battery current all the way through the night. So not using fill(0) on the current produces an incorrect graph.

But using fill(0) for SoC produces completely wrong results: grafik

That is kind of logical, because SoC gets reported way more rarely, only on change. So as soon as $interval is smaller than the reporting interval of SoC, it will pull it to 0. That is how grafana and influx works.

And I think that brings us to the bottom of the issue: MQTT behavior regarding retaining of last values.

Obviously OpenDTU is set to only report changes, which makes sense, but for visualization you need a sane interval replacement value. In case of current the safe bet is usually 0, for SoC last is the only thing that makes sense.

So this now more boils down to the question if MQTT retain flags are being sent and how this is then translated into influx and grafana.

MalteSchm commented 7 months ago

@MalteSchm Do you care to have a look at this? Did you ever experience similar issues (AFAIK you use a Pylontech and wrote the code initially?!

@schlimmchen The Pylontech code was there when I started. Initially this data was only available over Mqtt I believe. I only added the Web Interface.

The thing I can offer is to add the mqtt value for Pylontech current to my Node Red Dashboard to see if I can reproduce this. But maybe I wait until it is clear that this is not related to influx and data retainment

spcqike commented 7 months ago

But maybe I wait until it is clear that this is not related to influx and data retainment

could you just check in MQTT explorer (or node red) and make sure, whether or not every interval sends every value, even if they are unchanged? If you can confirm, that current and soc and stuff is send every x seconds, the problem must be in his processing (ha) or influx.

@markusdd

I just posted how it looks with interpolation. It's plain wrong then , because it look like there was large battery current all the way through the night. So not using fill(0) on the current produces an incorrect graph.

it looks like neither of your diagrams shows an interpolation, like i said (step-after). as you show in your csv export, there should be a "0" after "-10.9", so step-after should give a 0-line until another current reading comes in. (but only IF another reading occurs in your displayed range, otherwise it will just stop at the last value)

yes, if you query the mean(value) within a 10s interval, but have -10.9 and 0 within 5s, these will get averaged and your last point will show "-5.45", as thats your calculated average.

but still, the problem itself isn't grafana nor is the missing duplicate values, it's a problem of visulization and interpretation.

of course, it would be easier to handle, if there would be a 0 in your database for every 5s interval. (or to be more general: if there would be the same reading in every interval, even if the value does not change)

But using fill(0) for SoC produces completely wrong results:

thats not a wrong result. thats the output of your query, shown as it is. you only have 2 datapoints for SoC in a 15h timeperiod. that's it. you can visulize them in different ways. your standard way is to connect them together and interpolate missing values inbetween.

lets say you have 20.2% at 11:00 and 20.1% at 19:30. so if you check the value at 15:00 it should show 20.15%, but that is no real nor valid datapoint. however you could (and thats what i normally do) also use step-right, so 20.2% will stay valid until a new point comes in.

for fast changing values, none of these visulizations makes any sense. we sample the battery current in mqtt interval, at maximum. (i dont know if the data is pulled faster from the battery itself, but it's only send in mqtt interval) lets say 5s. if we transmit and store 00:00 5A 00:05 5.1A 00:10 2A 00:15 0A

its totally possible that the real values were 00:00 5A 00:01 0A 00:02 3A 00:03 4A 00:04 1A 00:05 5.1A ....

we just dont know.

for slow chaning values like SoC, the standard interpolation is totally fine.

That is kind of logical, because SoC gets reported way more rarely, only on change

if you know, that SoC is only reported on change, why do you assume, that current is reported every time, even if it does not change?

And I think that brings us to the bottom of the issue: MQTT behavior regarding retaining of last values.

retaining is something different. thats only used and usefull, if you want to push the last value of something to a newly connected device. that message is not published periodically on its own.

Obviously OpenDTU is set to only report changes, which makes sense, but for visualization you need a sane interval replacement value. In case of current the safe bet is usually 0, for SoC last is the only thing that makes sense.

i disagre with that. just for visulization, you dont need it. but it makes things much easier, yes. also, why do you think

In case of current the safe bet is usually 0

? its totally possible, that two continuous current readings are the same, like here grafik grafik grafik

even tho this are only small values, its totally possible to have a constant dis-/charge for a period of some seconds. also with higher currents.

So this now more boils down to the question if MQTT retain flags are being sent and how this is then translated into influx and grafana.

even if these topics are flaged as retained message, the message is only published do newly connected clients once they connect. its not published periodically. https://www.hivemq.com/blog/mqtt-essentials-part-8-retained-messages/ Retained messages offer valuable benefits in various scenarios, particularly when you need newly-connected subscribers to receive messages promptly without waiting for the next message publication.

schlimmchen commented 7 months ago

Obviously OpenDTU is set to only report changes

That's a (questionable) feature of the Victron MPPT implementation and not present in the Pylontech MQTT handler. If data arrives from the Pylontech over CAN, it is transported upwards and it is published unconditionally every "MQTT publish interval setting" seconds.

And I think that brings us to the bottom of the issue: MQTT behavior regarding retaining of last values.

Hm, but why? Retaining values is something for clients that connect fresh. If you want new clients to be able to see a particular value last published by some other client, you want the retain flag. Otherwise new clients will not know a topic's message. This feature should be irrelevant here, since your HA and influxdb are always connected and at work and should see all data.

Back to the issue at hand: The last CSV you shared does have a plain 0 in row 15377, then no data until row 17651, which is okay, the current stays zero until that row, so I don't understand what you are hinting at with the red arrow in row 17650.

The fill(null) screenshot you shared simply proves that interpolation is happening there. The line is forced onto a curve to gradually meet the "negative spike", then gradually "goes back up" towards zero. The fill(0) screenshot is correct exactly because you have to interpret the data like that.

Nevertheless it would of course be better is the 0 were there in the raw data.

Well, it is. You will have to work with the idiosyncrasies of time-series databases, which in particular means that values are only stored when they change, and zeroes might even not be stored at all or only show up as expected if you use the right query.

I am convinced that there is no issue with OpenDTU-OnBattery, which is why I am closing this issue.

schlimmchen commented 7 months ago

even tho this are only small values, its totally possible to have a constant dis-/charge for a period of some seconds. also with higher currents.

Yes, and influxdb will not "save" repeating values, only the one value with the timestamp the value was first observed at.

@spcqike You did a huge edit to your last post. Don't be afraid to double-post if (and only if) your message/content changes significantly. Otherwise your contribution might go unnoticed.

spcqike commented 7 months ago

sry, i just added my comment to @markusdd. i'm not used to double post, if i'm the last commentator :)

Yes, and influxdb will not "save" repeating values, only the one value with the timestamp the value was first observed at.

are you sure about that? because in my experience, it does store double values.

e.g. i store the humidity, which changes very slowly, on a 10s time basis. if i query it with a 1s interval and fill(0), i get a lot of "0", obviously, but i also get my 2 consecutive "40" in their 10s interval

grafik

if i query it, like i store it, in 10s interval, i get a lot of "40" (and 1 trailing "0" as there is no value reported, yet :D)

grafik

and influx itself reports plain values with different but consecutive timestamps

grafik

markusdd commented 7 months ago

I'm a bit irritated by this discussion and a closure of the ticket in the middle of the night tbh...

What do we collect the data for? Right, visualization and probably in some cases processing for calculation purposes to see when systems have paid off etc.

The problem here is very obvious: If you only send the last zero the previous day, and if e.g. anyone then wants a display of the current day, what you will see up to the first datapoint at like 6 or 7am is either nothing, or garbage, because it's interpolated. Depending on the nature of the data, you can save is by using things like fill(0). This isn't even a problem of timeseries vs SQL databases, that is a general issue and exactly what I meant with MQTT behavior/setup.

OpenDTU in fact has such configurations in other places where MQTT data is being sent at least in a larger interval even if there was not a change to exactly avoid such problems.

Or, if you do not want to do that, it is at least advisable to send a repeated datapoint right before an event where you know for a fact there will be a change right away, the inverter restart is such an event because it will trigger a rise in battery current until the production limit kicks in again.

And again: I do no processing AT ALL. MQTT messages end up in Home Assistant and the entity ID is registered for influx: dump it in. This is the approach that I've taken for all my solar data ever.

The issue with this particular series stems from the fact that it silent for such long periods of time that visualizations and calculations over certain time windows run into issues.

There are more or less elegant solutions to this: a) brute force, just re-send every like 5min. That is still not a lot of data, but will resolve most of these issues b) be more intelligent about it and re-send before major event you know will trigger a steep change, e.g. inverter restart. Harder to do, but even less data.

I, frankly, would just go for a).

As mentioned, original openDTU does exactly this for certain cases to e.g. enable current value displays. grafik

I get the idea to be very scarce with the data, but this pylontech current is too scarce. I also tried to add an immidiate display to my dashboard like I have it for the solar power and I have to use a pretty long interval for that to work somehow. If the current is zero there might be no data for hours and it does not work. Problem there: during the day, if non-zero current does not change, fill(0) produces a wrong display fill(last) might lead to a non-zero current being displayed when in reality there is none.

So frankly, as a compromise, I would keep the current strategy and just publish every 5min (or make it UI adjustable) to ensure that current value and time window displays work correctly without hassle, because this is the 90% main usecase for everyone. The data storage overhead for this is very neglidible. But I agree sending every 5s if there is no change is unnecessary, we shouldn't be doing that.

What do you think?

markusdd commented 7 months ago

Addition, I also noticed the Victron data and inverter limit of the zero feed have a similar problem: grafik

There nothing at all is sent through the night, leading to a weird interpolation from switch off to suddenly producing.

as you see the opther graphs do not do this. They come from OpenDTUs original codebase which uses the strategy I posted from the Release in my answer above.

schlimmchen commented 7 months ago

I'm a bit irritated by this discussion and a closure of the ticket in the middle of the night tbh...

Yeah, I am irritated as well, much more than I should be. I continue to work on ignoring people instead. In this instance, I am having problems...

The problem here is very obvious

Yes, it is obvious to me as well: You need to understand how your tools work.

OpenDTU in fact has such configurations in other places where MQTT data is being sent at least in a larger interval even if there was not a change to exactly avoid such problems.

a) brute force, just re-send every like 5min. That is still not a lot of data, but will resolve most of these issues

It grinds my gear that you are not reading what I wrote: All Pylontech Data is published to the MQTT broker at a fixed interval. Period. a) is already in place. Check your MQTT broker. It will tell you that a new message was published at the respective topic every "publish interval" seconds.

The issue with this particular series stems from the fact that it silent for such long periods of time that visualizations and calculations over certain time windows run into issues.

No. The issue is present for shorter periods of time and other series as well, you just don't notice it (as much) in the respective graphs.

Addition, I also noticed the Victron data and inverter limit of the zero feed have a similar problem:

Yes, since you need to understand how influxdb saves values and presents them to you or grafana depending on your query.

They come from OpenDTUs original codebase which uses the strategy I posted from the Release in my answer above.

This is the commit you pointed to: https://github.com/tbnobody/OpenDTU/commit/7d90937d0f30625c5ce283388d489ffa5062e49a

It makes sure that values of a particular inverter that becomes unreachable, i.e., does not provide data any more, is reset to zero in the MQTT broker.

Your Pylontech never becomes unreachable. There is no need for such a concept. All data which is received from the Pylontech is published to the MQTT broker at a fixed interval.

What happens to the data afterwards (HA, influxdb, grafana) is none of OpenDTU-OnBattery's business.

What do you think?

Your problem does not go away if OpenDTU-OnBattery would publish a 0 to the battery current topic more often than it already does while it is zero. influxdb will still only save one zero paired with the timestamp the current changed to 0 the last time.

markusdd commented 7 months ago

Sorry man, you are just being rude here. You are incorrect. I've pointed out already that only this series has this issue. And I've laid out why it leads to problems in visualiszations and time-windowed calculations.

I've contributed to to this community and project more than you might realize, with several PRs in OpenDTU and ahoyDTU as well as enabling more than 4k people now to have their data collected through the Fusion PCB in a reliable manner.

Yet you are treating me like I'm some random idiot not understanding how databases work or howw OpenDTU works. Sorry, wrong address here.

GFY, really. I'm just going to have my dashboards use rpelacement values, might another user come about and ask for the same enhancement.

--> Blocked, I have no business with a*holes in Open Source.

spcqike commented 7 months ago

e31

@schlimmchen are we sure that it really does send the values in every mqtt interval? until @MalteSchm doesn't confirms or denies it, i would keep this part open.

and, as you can see in my above attached images, influx indeed does store the same value serveral times. at least i get it as query result, if i just query the plain raw values. https://github.com/helgeerbe/OpenDTU-OnBattery/issues/843#issuecomment-2039121966

i don't know if this was changed with flux2.0, or if HA does cut off double values, or if openDTU just doesn't send it.

markusdd commented 7 months ago

yup. things escalate if one party constantly denies there might be an issue and rather goes ad hominem and thinks we're 'wasting their time'. but anyway... Thanks @spcqike for being constructive and digging in with me.

It's InfluxDB 1.8 (as per HA plugin), so this is not changed behavior with influx 2. I've also provided my HA config above, there is no furtrher filtering going on, my influx also doesn't have compression enabled, so I am not dropping any data points for older data.

I reconfigured my dashboard further and SoC also suffers from infrequent storage of data, if SoC does not change you often get incomplete graphs. The sane fill function for SoC is fill(previous), but that doesn't help much when SoC has last been sent hours ago and that is outside the left portion of the graph.

Also, looking at logs:

[Victron MPPT] Text Event H23: Value: 1197
14:07:35.791 > [Victron MPPT] Text Event HSDS: Value: 5
14:07:35.983 > [VE.Direct] serial input (189 Bytes):
14:07:36.528 > [VE.Direct] 0d 0a 50 49 44 09 30 78 41 30 35 38 0d 0a 46 57
14:07:36.866 > [VE.Direct] 09 31 36 33 0d 0a 53 45 52 23 09 48 51 32 32 33
14:07:36.980 > [VE.Direct] 38 47 59 50 46 36 0d 0a 56 09 34 39 39 31 30 0d
14:07:37.277 > [VE.Direct] 0a 49 09 31 33 33 30 30 0d 0a 56 50 56 09 38 30
14:07:37.838 > [VE.Direct] 30 33 30 0d 0a 50 50 56 09 36 38 30 0d 0a 43 53
14:07:37.941 > [VE.Direct] 09 33 0d 0a 4d 50 50 54 09 32 0d 0a 4f 52 09 30
14:07:38.145 > [VE.Direct] 78 30 30 30 30 30 30 30 30 0d 0a 45 52 52 09 30
14:07:38.708 > [VE.Direct] 0d 0a 4c 4f 41 44 09 4f 4e 0d 0a 48 31 39 09 32
14:07:40.782 > [VE.Direct] 33 37 35 0d 0a 48 32 30 09 33 38 33 0d 0a 48 32
14:07:41.001 > [VE.Direct] 31 09 31 33 35 35 0d 0a 48 32 32 09 33 32 32 0d
14:07:41.486 > [VE.Direct] 0a 48 32 33 09 31 31 39 37 0d 0a 48 53 44 53 09
14:07:41.846 > [VE.Direct] 35 0d 0a 43 68 65 63 6b 73 75 6d 09 82
14:07:41.920 > [Pylontech] soc: 30 soh: 10014:07:42.267 > [Pylontech] voltage: 49.849998 current: 2.500000 temperature: 17.30000114:07:43.054 > [Pylontech] chargeStatusBits: 1 1 014:07:43.159 > RX Period End
14:07:43.569 > All missing
14:07:43.668 > Nothing received, resend whole request
14:07:56.517 > TX ActivePowerControl Channel: 61 --> 51 84 61 72 26 80 11 99 16 81 0B 00 02 92 00 01 F5 60 70 
14:07:56.663 > Interrupt received
14:07:56.781 > RX Channel: 23 --> D1 84 61 72 26 84 61 72 26 81 00 00 0B 00 14 07 48 | -80 dBm
14:07:56.897 > [Victron MPPT] Text Event PID: Value: 0XA058
14:08:07.942 > [Victron MPPT] Text Event PID: Value: 0XA058
14:08:08.058 > [Victron MPPT] Text Event FW: Value: 163
14:08:08.165 > [Victron MPPT] Text Event SER: Value: HQ2238GYPF6
14:08:08.318 > [Victron MPPT] Text Event V: Value: 49880
14:08:08.872 > [Victron MPPT] Text Event I: Value: 13100
14:08:09.048 > [Victron MPPT] Text Event VPV: Value: 78680
14:08:09.300 > [Victron MPPT] Text Event PPV: Value: 670
14:08:09.751 > [Victron MPPT] Text Event CS: Value: 3
14:08:09.856 > [Victron MPPT] Text Event MPPT: Value: 2
14:08:09.984 > [Victron MPPT] Text Event OR: Value: 0X00000000
14:08:10.724 > [Victron MPPT] Text Event ERR: Value: 0
14:08:10.854 > [Victron MPPT] Text Event LOAD: Value: ON
14:08:11.043 > [Victron MPPT] Text Event H19: Value: 2375
14:08:12.295 > [Victron MPPT] Text Event H20: Value: 383
14:08:12.395 > [Victron MPPT] Text Event H21: Value: 1355
14:08:13.175 > [Victron MPPT] Text Event H22: Value: 322
14:08:13.323 > [Victron MPPT] Text Event H23: Value: 1197
14:08:13.477 > [Victron MPPT] Text Event HSDS: Value: 5
14:08:13.981 > [VE.Direct] serial input (189 Bytes):
14:08:14.143 > [VE.Direct] 0d 0a 50 49 44 09 30 78 41 30 35 38 0d 0a 46 57
14:08:14.918 > [VE.Direct] 09 31 36 33 0d 0a 53 45 52 23 09 48 51 32 32 33
14:08:15.067 > [VE.Direct] 38 47 59 50 46 36 0d 0a 56 09 34 39 38 38 30 0d
14:08:17.886 > [VE.Direct] 0a 49 09 31 33 31 30 30 0d 0a 56 50 56 09 37 38
14:08:18.223 > [VE.Direct] 36 38 30 0d 0a 50 50 56 09 36 37 30 0d 0a 43 53
14:08:18.378 > [VE.Direct] 09 33 0d 0a 4d 50 50 54 09 32 0d 0a 4f 52 09 30
14:08:18.539 > [VE.Direct] 78 30 30 30 30 30 30 30 30 0d 0a 45 52 52 09 30
14:08:18.684 > [VE.Direct] 0d 0a 4c 4f 41 44 09 4f 4e 0d 0a 48 31 39 09 32
14:08:18.785 > [VE.Direct] 33 37 35 0d 0a 48 32 30 09 33 38 33 0d 0a 48 32
14:08:18.893 > [VE.Direct] 31 09 31 33 35 35 0d 0a 48 32 32 09 33 32 32 0d
14:08:19.003 > [VE.Direct] 0a 48 32 33 09 31 31 39 37 0d 0a 48 53 44 53 09
14:08:30.005 > [Victron MPPT] Text Event PID: Value: 0XA058
14:08:30.107 > [Victron MPPT] Text Event FW: Value: 163
14:08:30.279 > [Victron MPPT] Text Event SER: Value: HQ2238GYPF6
14:08:30.497 > [Victron MPPT] Text Event V: Value: 49920
14:08:30.701 > [Victron MPPT] Text Event I: Value: 14600
14:08:32.032 > [Victron MPPT] Text Event VPV: Value: 79720
14:08:32.201 > [Victron MPPT] Text Event PPV: Value: 744
14:08:32.418 > [Victron MPPT] Text Event CS: Value: 3
14:08:32.600 > [Victron MPPT] Text Event MPPT: Value: 2
14:08:32.742 > [Victron MPPT] Text Event OR: Value: 0X00000000
14:08:32.851 > [Victron MPPT] Text Event ERR: Value: 0
14:08:33.260 > [Victron MPPT] Text Event LOAD: Value: ON
14:08:33.432 > [Victron MPPT] Text Event H19: Value: 2376
14:08:33.546 > [Victron MPPT] Text Event H20: Value: 384
14:08:33.689 > [Victron MPPT] Text Event H21: Value: 1355
14:08:34.102 > [Victron MPPT] Text Event H22: Value: 322
14:08:34.200 > [Victron MPPT] Text Event H23: Value: 1197
14:08:34.334 > [Victron MPPT] Text Event HSDS: Value: 5
14:08:35.303 > [VE.Direct] serial input (189 Bytes):
14:08:40.713 > [VE.Direct] 0d 0a 50 49 44 09 30 78 41 30 35 38 0d 0a 46 57

to me this all does not look like the Pylontech data is frequently being sent as schlimmchen suggested. I see multiple events for Victron in regular intervals, and even though dashboard says it gets pylontech data every 1-2 seconds, there is much rarer logging in the console about the battery. This setup is remote, I have to see if I can do some MQTT snooping on this via VPN, but to me it surely does not like there are 5s reporting interval updates for the battery.

schlimmchen commented 7 months ago

So, I re-read all of this, and I think the only time I was rude was my last post, which certainly could have been nicer. At least I refrained from calling you something (or suggesting to call you something) or telling you to do something, so I will hold that in my favor.

My last post was not nice, because as much as I am pushing back on acknowledging that something is wrong with OpenDTU(-OnBattery), you are pushing the other way, i.e., it does not occur to me that you think it is even possible that something other is broken, and that really pisses me off.

Also, there seems to be no acknowledgement that I did testing to try and help find out what the issue might be, or that I looked closely at the code and wrote that I am sure that every Pylontech data point is published regularly, and that I stared at my MQTT explorer to confirm that data arrives regularly.

if one party constantly denies there might be an issue and rather goes ad hominem

It is neither only one party that is in denial, nor am I trying to make this personal or am I aware that I use bogus arguments.

I've contributed to to this community and project more than you might realize, with several PRs in OpenDTU and ahoyDTU as well as enabling more than 4k people now to have their data collected through the Fusion PCB in a reliable manner.

I have great respect for the OpenDTU Fusion board. I own two, bought another one for my father, and are going to buy another one to help me implement and test code. Very good job, I mean it! And even though we shortly interacted nicely in the past, I did not recognize/remember it was you. And I am glad that I did not. Do you want me to put you on a pedestal and praise you and not question what you say/write? I don't understand what your past contributions have to do with this issue or more precisely with my complaint that you seem to dismiss my arguments.

Blocked, or actually not blocked, I would like to continue to contribute here. I promise to keep my tone down, and I kindly ask that you do not read anything into my wording.

This instance of MQTT explorer has been running for roughly 1:45h, so 105min or 6300s, so a message was published at this topic for every 6 seconds. It should be 5. Meh. It's off, and I don't know the reason... However, it clearly indicates that the BatteryStats does indeed publish all values regularly to the broker, even if they are zero. Disclaimer: This data is produced using the PYLONTECH_DUMMY (see the source code), which should have nothing to do with how the values are published to the broker, except that it provides (dummy) data.

This is the HA auto-discovery info for the Pylontech current:

{
  "name": "Battery current",
  "stat_t": "solar-staging/battery/current",
  "uniq_id": "0001_battery_current",
  "unit_of_meas": "A",
  "dev": {
    "name": "Battery(0001)",
    "ids": "0001",
    "cu": "http://192.168.16.160",
    "mf": "OpenDTU",
    "mdl": "Pylontech US3000C",
    "sw": "v24.3.31-14-gc0f5176"
  },
  "exp_aft": 15,
  "dev_cla": "current",
  "stat_cla": "measurement"
}

"exp_aft" is three times the default MQTT publish interval, so even if you set your interval to something else, the topic needs to miss three published messages before HA deems the sensor expired. I don't see anything else that could trip up the data collection because of how the value is advertised to HA.

So, even though my brain will probably continue to try coming up with possible issues in the back of my head, I maintain that there is nothing wrong with OpenDTU(-OnBattery) in this regard. Which is also why I closed this issue. I am sure of it. (And I will sincerely apologize if it turns out that I am wrong, but right now, I am sufficiently confident that the problem (and I do acknowledge that there is one) is not in OpenDTU(-OnBattery)).

to me this all does not look like the Pylontech data is frequently being sent as schlimmchen suggested.

there is much rarer logging in the console about the battery.

Well... Can we agree that the output by the Victron handler(s) does not indicate anything about the Pylontech handler? Do you have verbose logging on for the battery provider? As far as I can see, the PylontechCanReceiver dumps every message values onto the serial output / console if verbose logging is enabled. If you have and you don't see the respective messages

"[Pylontech] voltage: %f current: %f temperature: %f\n"

then something is wrong with the data not being received over CAN. However, even if there should be something broken there, the BatteryStats will still happily publish all data points regularly, but those would be the last values received.

I have to see if I can do some MQTT snooping

Okay, so spcqike wrote something that made you consider doing this, good. Please do so. I you need help with this, ask. It is enough to use SSH with a port-forwarding (if you have an SSH host at the remote location) and you will be able to connect to the remote broker.

schlimmchen commented 7 months ago

Here is another indication why this graphing issue is probably about data processing and not about OpenDTU-OnBattery failing to send data:

Please have a look at this (from 2024-04-04-22-31 Chronograf Data.csv):

The spike reported by the Pylontech was -7.1A. And the next sample does read 0. And all samples up until the peak are expected to be read as zero as the last value was 0.

The graph you shared earlier, however, shows something very different at the respective time point:

Why is the peak not even reaching -7.1A?

My best guess is that the data to plot this graph was pre-processed by influxdb before grafana tried to plot it. That probably has something to do with how many data points are requested or how big of an interval is requested. The original data, which did include -7.1A as a spike and 0A right at the next sample, is not being used as such.

Here is what I asked ChatGPT, hoping that the query would not be too suggestive:

when grafana is connected to an influxdb, it seems the data is
pre-processed and the graph does not look exactly like I expect it.
why is that? how can I force grafana to use all available data points
in a series to plot a graph?

Here is what it answered:

begin ChatGPT answer

Hey Bernhard,

When you notice that Grafana graphs using InfluxDB as a data source don't match your expectations, it's often due to how Grafana aggregates or samples data points, especially over large time ranges. This behavior is intended to optimize performance and readability. Here's a brief rundown on why it happens and what you can do:

Why It Happens:

Data Aggregation: Grafana automatically aggregates data over time to prevent performance issues caused by rendering too many points.
Automatic Grouping: Depending on the time range selected, Grafana groups data into larger intervals.

Solutions:

Adjust the Query: Use InfluxQL or Flux (depending on your InfluxDB version) to specify finer time intervals in your query. For InfluxQL, you can use the GROUP BY time(<interval>) clause to control the granularity. For Flux, you can use the window() function.
Increase Max Data Points: In the panel settings under "Query Options," increase the "Max data points" value. This tells Grafana the maximum number of data points it should retrieve and render. Higher values can lead to more detailed graphs but may impact performance.
Change Min Interval: Also in the panel settings, you can adjust the "Min interval" to specify a lower limit for data grouping. Reducing this interval can lead to more granular data display.
Review Grafana Version: Ensure your Grafana version is up to date. Older versions might handle data differently or less efficiently than newer releases.
Server and Client Performance: Be aware that displaying very granular data over long periods can significantly impact both server load and client-side (browser) performance. It's a balance between detail and usability.

By tweaking these settings, you can get Grafana to display your data more accurately, according to your expectations. Always keep an eye on performance, though, especially if you're dealing with large datasets.

end ChatGPT answer

I feel that this confirms my suspicions.

What do you think @spcqike? You have an influxdb, do you? Care to experiment to reproduce with a matching query and fix the query, disregarding performance issues? BTW: Can I import historical data into influxdb from HA? If so, I could experiment as well. My strategy is using prometheus and an MQTT exporter together with grafana, but I might as well play around with other tools while I figure out what my setup should be.

spcqike commented 7 months ago

I think you are totally right and I don’t need to test this behavior as it’s exactly what it does.

Grafana standard interval is 10s. Mqtt interval is 5s. Standard query is most likely „mean(value)“ so depending on the real timestamps you will get either get

00:00 -7.1 00:10 0

00:00 -3.55 00:10 null The later is what you see in the diagram referenced (not reaching -7.1 at all)

as I said earlier, you can get rid of most confusion by using different queries (like group by(), fill(), …) and settings in grafana (settings like line visualization, max data points, interval, range, …)

if you want to have the most exact diagram you need an interval(5), fill(previous) and maybe something else. So that you don’t average values and have a value for every interval. But that creates a lot of data and really spiky diagrams.

On the other hand he is right, too. If you don’t have a leading or trailing valid data point in your time range, you graph will start and end somewhere within your range (ok will fill(previous) it may just start somewhere as you can’t fill unknown values, but you can use the last known value till the end)

that’s why I also store values on a 10s basis. Duplicate or not. It makes processing and displaying much easier. But it’s not needed in most cases (like these low powered zigbee sensors that send an update after a certain change on temperature or after 1h if no change happened)

as for your import. Yes you can import data. But I don’t know about HA (I don’t use HA) If I need to do something like this, I use node-red as I normally manipulate some data (like changing kW to W) and add tags and stuff.

spcqike commented 7 months ago

Another thing:

I don’t know exactly how mean() works with fill(previous)

of course, if you want to see wider time ranges, an interval(5) makes no sense at all. So you want an auto interval to scale with your sample rate.

For my readings, I like to visualize min(), mean() and max() each interval. So you get something like

I don’t know how this works and calculates, if data points are missing.

Eg 00:00 -7.1 00:05 0 00:10 null …. 00:55 null

if you now use an (auto?) interval of 1m, do you get an average of -3.55 like before, as all other timestamps don’t have a valid value, you do you get -0.6A?

As I said, this is much easier and clearer with consecutive data points. But probably not impossible with missing datapoints (as everything can be calculated somehow)

github-actions[bot] commented 6 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.