xoseperez / hlw8012

HLW8012 library for Arduino and ESP8266 using the Arduino Core for ESP8266.
GNU General Public License v3.0
125 stars 48 forks source link

Improve resolution + fix crash #32

Open TD-er opened 1 year ago

TD-er commented 1 year ago

Crash:

_voltage = (voltage_pulse_width > 0) ? _voltage_multiplier / voltage_pulse_width / 2 : 0;

Since voltage_pulse_width is volatile, it may be set to 0 in an interrupt handling right after it was checked for not being zero.

Resolution improvement: When measuring 230V mains voltage on a Sonoff POW r1, it reported the voltage with a resolution of about 8V. Changing some types to float instead of int does improve it slightly.

The actual measurement was done by measuring the "last" pulse of a measurement period due to instability of the pulse frequency right after setting it. For 222 & 230V mains this was a pulse width of roughly 970 - 1000 usec. This is also roughly the jitter of an ESP8266 when handling interrupts.

The change I made was to keep track of both the measurement duration, the last pulse width and count the nr. of pulses. If the nr of pulses is low, the pulse duration is way longer than the jitter, so then we can take the last pulse width. If the nr. of pulses is high, then the outliers of the first few pulses won't matter a lot. Also "instability" does work both ways and thus evens out.

Now the resolution of the measurements from the CF1 pin is way more usable and also quite accurate. Compared it with the CSE7766 used in a Sonoff POW r2 and calibrated both using a proper multimeter. Both differ in the order of 0.1V from each other and the multimeter.

A before and after: image

Another change I made is to get rid of double variables. They're not really needed here and give quite a lot of overhead as double computations are done in software.

TD-er commented 1 year ago

Maybe also merge this pending PR: https://github.com/xoseperez/hlw8012/pull/26/files

mcspr commented 1 year ago

+1, for the float instead of int for measurements and calibration but, version bump should probably for other number than just the patch (either 2.0.0 or 1.2.0)

They're not really needed here and give quite a lot of overhead as double computations are done in software.

Our floats and doubles on both 8266 and 32 are software regardless of precision option? We get rid of double funcs, but idk how noticeable the speed increase or something impactful like power consumption. In case of polling performance, you might to look at removing volatile and replacing them with memory barrier instead; we don't really run into a problem of half-read variables b/c of 32bit loads on 32bit variables

TD-er commented 1 year ago

The float/double part isn't used in the callback function, nor as volatile. Those are only int types (32 bit) I left the volatile flags as those are altered in callback functions.

The main reason for me to switch to float from double was that the extra precision isn't needed and it does add up on build size. CPU-usage is not that important here as those floating point calculations are not done very often.

but, version bump should probably for other number than just the patch (either 2.0.0 or 1.2.0)

The version bump was more of a suggestion, just in case it was overlooked. Then it could maybe still be distinguished by users. Perhaps I should have left it out as it is upto the maintainer to "label" things.

mcspr commented 1 year ago

Let me rephrase - you don't have to use volatile nor does it use mean operation will be in sync / atomic. In ESP case, we just end up with memw instruction. In case of 8266, kind of pointless. In case of dual-core 32, minor slowdown https://www.espressif.com/sites/default/files/documentation/esp32_errata_en.pdf /memw/ https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html /on the topic of volatile:)/

Suggested copy is perfectly fine btw, I just meant to comment on other possible overhead (as mentioned above for doubles)

TD-er commented 1 year ago

OK, will look into the "volatile" part a bit more. I thought it also forced mapping the variable into iram memory, which is needed for callback code as it may otherwise need extra steps to access it.

TD-er commented 1 year ago

by the way, the "ESPEasy copy" of this code does have another change which I did not include here. I use the "direct IO" to toggle the SEL pin as this takes a lot less time as you know from our discussion (or was it with Jason2866 ?) on the Dallas sensor implementation I changed recently for ESPEasy.

TD-er commented 1 year ago

OK, I read through your links. What will happen if I use non-volatile members which will be changed in a callback function? I was under the impression the ESP could/would crash. But you're telling me it doesn't? I guess I have to copy this data first before using it as it still needs to be considered "volatile", but highly depending on whether the compiler decides it may be cached somehow.

I know the volatile attribute simply means "always need to access the original address, no caching allowed". But how is it being used on the ESP platforms? And what effect has the memory barrier? (I know it tells the compiler not to optimize instructions out-of-order crossing this barrier) For example, is the volatile attribute placing the member in iRAM on the Espressif platforms? I can assume the memory barrier might be useful to hint the compiler to make a ++ operation as atomic as possible, but I wonder whether it is needed in this specific use case in the callback function for CF1?

mcspr commented 1 year ago

'volatile' in language meant to signify two things - order of operations (aka serialization) and explicitly tell the compiler it should not optimize reads or writes. When you access global variable or member, it is already thinks of both as 'globally accessible' when reading but still may optimize your operations to reduce the number of writes. Most useful example is a code one, see the asm (gdb disassemble) differences with and without one.

Xtensa implementation adds memw instruction to 'flush' pipeline of mem loads and stores, so you end up with the most recent version of data. With multiple variables, you end up with multiple syncs in the machine code; both ISR and main thread.

It has no relation to the section attribute and 'IRAM', both this and this->member are still in the usual RAM they always were. The practical difference is you may get not most recent data, but you still get the data loaded. It is not atomic, still, unless we lock all of the data we need beforehand or transfer it some place else and signal main code that it is available for read.

TD-er commented 1 year ago

Thanks for the explanation.

alranel commented 1 year ago

@TD-er I spotted a possible mistake in the inversion logic. I think this:

         const unsigned char mode = 1 - _mode;
         digitalWrite(_sel_pin, _mode);
         _mode = mode;

should be like this (notice mode instead of _mode in the second line):

         const unsigned char mode = 1 - _mode;
         digitalWrite(_sel_pin, mode);
         _mode = mode;

Infact, this is what you're doing in the ESPEasy version:

          const unsigned char mode = 1 - _mode;
         DIRECT_pinWrite_ISR(_sel_pin, mode);
         _mode = mode;

(BTW thank you for this patch! It works perfectly for me.)

TD-er commented 1 year ago

Ah had to re-read your post at least 3 times before I saw the misplaced _ Will update it in the PR.

The patch I now use on a Sonoff POW (r1) version and it now is actually usable and quite consistent compared to other meters (e.g. the POW r2 next to it)

rrelande commented 1 year ago

The actual measurement was done by measuring the "last" pulse of a measurement period due to instability of the pulse frequency right after setting it. For 222 & 230V mains this was a pulse width of roughly 970 - 1000 usec. This is also roughly the jitter of an ESP8266 when handling interrupts.

The change I made was to keep track of both the measurement duration, the last pulse width and count the nr. of pulses. If the nr of pulses is low, the pulse duration is way longer than the jitter, so then we can take the last pulse width. If the nr. of pulses is high, then the outliers of the first few pulses won't matter a lot. Also "instability" does work both ways and thus evens out.

Now the resolution of the measurements from the CF1 pin is way more usable and also quite accurate. Compared it with the CSE7766 used in a Sonoff POW r2 and calibrated both using a proper multimeter. Both differ in the order of 0.1V from each other and the multimeter.

This is a good idea. You may want to also take into account there are a high number of spurious pulses at low pulses. To get even better figures at low power/voltage, you may want to filter out those pulses that are not meeting the 50% duty ratio.

TD-er commented 1 year ago

Right now the plugin is working quite well. I'm not sure if filtering on duty cycle is a good idea here as the actual processing of the pulses is done with quite a lot of jitter. After all, the same jitter is the reason why we need this approach.

rrelande commented 1 year ago

if you measure 220V or normal power, you don't need the filtering. only if voltage is low (frequency of 2-4Hz), then you will definitely need it.

actually fake spurious pulses cause jitter and this is another way to further reduce the jitter.

i see that unfortunately the PR was not merged?