kirenbahm / ENP_TOOLS

Scripts used to pre- and post-process data
0 stars 2 forks source link

Update reduce_NONEQDIST.m #35

Closed LAGO-Support closed 4 years ago

LAGO-Support commented 4 years ago

Changes are to fix averaging error in code. Will go into detail in comments

LAGO-Support commented 4 years ago

The reason for changes is due to inconsistencies noticed between original real-time data and the hourly and daily outputs done in D02 and D03. You can run the master and the changes to see the output differences. For an hourly example ALC.discharge.dfs0 the first 5 values are: 4/25/2007 00:00 -12 4/25/2007 00:15 -3.7 4/25/2007 00:30 7.3 4/25/2007 00:45 16 4/25/2007 01:00 11 thus when you look for all timesteps at the date and hour 0 your get the first 4 indexes returned. (master branch reduce_NONEQDIST.m line 122, or new branch line 149)

when you find the difference between the time values for use in the weighted average the diff function returns an array 1 element less than the original. (master branch reduce_NONEQDIST.m line 127, or new branch line 157)

that is why when the weighted average is calculated you skip the last element X(1:end-1) (master branch reduce_NONEQDIST.m line 136, or new branch line 169)

Without the changes, you would then calculate the weighted average for 4/25/2007 0 hour as 900 and 2700 are time in seconds (-12 900) + (-3.7 900) + (7.3 900) / 2700 = -2.8 This entirely skips the data from 00:45 time step. the weighted average in this example you first hour should be (-12 900) + (-3.7 900) + (7.3 900) + (16 * 900) / 3600 = 1.9 This was the case for all time period aggregates, missing the last elements for the period. Depending on the timesteps and period aggregated this could result in small or large changes.

I made two additions for each period (yearly, monthly, daily, hourly) calculation. The first was, so long as the last index for the period is not the last timestep in the file, it would add the next timestep to the index array. (new branch line 153) In our previous example finding all the indexes that match the date and hour would give [1 2 3 4]. now we are adding on the 5th index. [1 2 3 4 5]. This way when you run diff (new branch line 157) you get the DT for the 4th element. The second addition was a cap check. consider our previous example. What if the 5th timestep had been 4/25/2007 01:15 11, due do missing data at 1:00? then the 4th DT would have been calculated as 1800 seconds, not 900 seconds. The addition at line 158 checks if the sum of the DT elements exceeds the total time for the period. In this example 3600 seconds in the hour. If so the amount that exceeds the period is subtracted from the last DT element, which will be the incorrect element. Due to the coding logic all other elements must be with the period, only the final DT element I added could be longer than it should be due to edge cases.

kirenbahm commented 4 years ago

Thanks for catching this, and such a good explanation of the issue and the fix!