Quantisation Noise on high resolution tmpo data

JrtPec commented 9 years ago

I'm working on an algorithm to perform Non-Intrusive Appliance Load Monitoring on the Flukso data. It is important that I can get the data in the highest possible resolution.

I encounter difficulties when trying to convert data from Wh to W. I do this by taking the 1st derivative of the cumulative data via TMPO. The resulting signal shows the quantisation error (in the form of 1s spikes). screen shot 2015-02-09 at 13 33 13

I have tried a few work-arounds by using rolling means, low-pass filters or simply cutting of the spikes, but none of them seem to fully clean the useful signal. Does anyone know a method I could try to solve this, or does anyone know how Flukso calculates power use per second? Perhaps there is a way to access that data directly via tmpo that I am missing?

(This is my first time posting an issue, If you need more info or a code example to replicate the problem, please ask)

saroele commented 9 years ago

Hi Jan,

Great to see your question here, so others can participate in the discussion. I think what you see is a sampling issue. The original signal has a second resolution, so if you sample by second, you'll get spikes. If you would resample this data by 5s or 10s, you will get a much smoother instantaneous power.

Another issue could be the system itself: maybe this system does generate those very short peaks? Do you get the same with other fluksometers?

roel

icarus75 commented 9 years ago

The spikes are indeed due to the 1sec timestamp resolution of the time series. If you would like to go down to 1sec resolution, we could enable the FLM to log the instantaneous power instead of the cumulative. This will give you a datapoint for every second.

JrtPec commented 9 years ago

It would be awesome if I could download the instantaneous power as well. Right now the majority of the algorithm depends on the instantaneous power, so it would be pretty useful if I won't have to worry if something is a valid signal or not.

icarus75 commented 9 years ago

1/ The FLM's firmware needs to be changed to inject the power readings into tmpo. And right now it's either power or cumulative. I'll need the FLM's serial to make the change. 2/ Do you intend to feed complementary time series like water and gas usage to the algorithm? 3/ Will the algorithm's code be open sourced?

JrtPec commented 9 years ago

1/ Right now I'm not working with one specific FLM, I'm just picking sensors whose data I can test methods on (I don't have a Fluksometer myself). I'm pretty sure I can use my copromotor's Flukso to test with (FL03001251), but it might be useful to also have access to some other sensors. 2/ It was probably you who e-mailed me this morning with the suggestion to include water and gas. I really like the idea, and I'll surely look into it! 3/ It will be. I was still contemplating if I should push unfinished code to Github or not, or if I should wait until I have something that works. Suggestions? (again, no experience with group projects on github)

saroele commented 9 years ago

Bart, Jan,

(fyi: Jan is doing his master thesis on non-intrusive load monitoring (NILM) and will use opengrid data. Icarus75 is Bart, the man behind Flukso).

Jan, don't worry about pushing unfinished code. Just work in a branch so nothing get's mixed up with 'finished' code in the develop and master branch. Make committing and pushing a habit, it's also a good backup strategy to push to github.

On Mon, Feb 9, 2015 at 3:23 PM, JrtPec notifications@github.com wrote:

1/ Right now I'm not working with one specific FLM, I'm just picking sensors whose data I can test methods on (I don't have a Fluksometer myself). I'm pretty sure I can use my copromotor's Flukso to test with (FL03001251), but it might be useful to also have access to some other sensors. 2/ It was probably you who e-mailed me this morning with the suggestion to include water and gas. I really like the idea, and I'll surely look into it! 3/ It will be. I was still contemplating if I should push unfinished code to Github or not, or if I should wait until I have something that works. Suggestions? (again, no experience with group projects on github)

— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/44#issuecomment-73515998.

JrtPec commented 9 years ago

Bart, Could you notify me when you have performed the firmware change?

Also, is there a way (perhaps via the houseprint) to select all sensors that log instantaneous power data?

saroele commented 9 years ago

Jan, Bart proposed to have a small meeting to discuss the sampling issues and get you on the right track. I asked karel @kdebrab to take up contact with the two of you. Karel has already read some of the NILM papers, and he can also help to get the sampling issues solved. I cannot free time unfortunately in the coming weeks.

kdebrab commented 9 years ago

Jan,

Do you have the same issue with the interpolation code of #31?

JrtPec commented 9 years ago

I have used the interpolation exactly as described in that issue. Resampling doesn't really do anything in this situation though, because tmpo delivers the data already quantised to the nearest second and Wh, with NaN's where there is no data.

All that interpolation does is filling in those NaN's, but the step-like behaviour remains.

kdebrab commented 9 years ago

Ok. I see. You are looking at seconds data. There are ways to get a smoothier signal without filtering the data. The trick is to adjust the (miliseconds) timing of the pulse signals (1 pulse = 1 Wh = 3600Ws) within each second.

E.g. when one pulse place it at 500 ms; when two pulses within one second, place one at 333ms and the other at 667 ms, etc. After that, use the interpolation scheme of #31 to go back to seconds resolution. In this example, you have one pulse per second at most. Spikes will be leveled out a bit but not completely. You'll get improvement though by including info from nearby seconds when calculating the most optimal milisecond placement of the pulses. E.g. when deciding on the timing of one pulse within a second, place it at 500ms if the previous sec had also one pulse and place it at 250ms if the previous sec had no pulse.

However, I wonder whether you really need second resolution for your algorithms. Why not resample to 10 seconds or even one minute? You will lose some detail, but it will make your calculations more robust and faster!

opengridcc / opengrid-dev

Quantisation Noise on high resolution tmpo data #44