Closed l-k- closed 10 months ago
Thanks Lena.
In line 573 to 580 we update object last
to account for the scenario where a block reflects a larger time window than we expect, by incrementing last
. I see your update relates to accounting for the opposite scenario where a block lasts less than the expected time window (more values in a block than we expect).
if (rawTime[rawLast] < timeRes[last]) {
# block is shorter than expected
timecut = timeRes[last] - rawTime[rawLast]
positions2subtract = floor(cut * prevRaw$frequency)
last = last - positions2subtract
if (last <= pos) last = pos + 1 # to catch extreme scenario of time standing still or reversing
}
Following the above, I am wondering whether this means that frequency_observed
as calculated higher up in the function is still correct? I think it is correct, but could you double check?
Which of the test files did the problem occur in? It may be helpful if I double checked that new output looks plausible.
I think this is not an issue when impute == FALSE
because the tmp object returned by resample() will not have more samples in it than needed.
Here resample.cpp
determines how many samples / time points it needs to return.
In ReadAxivity, we start with 201 time points, and then use lines 573-580 to make sure that we give resample.cpp
enough time points to work with, but it's ok to give resample.cpp
more time points than needed, it'll only use what's needed.
I think frequency_observed
calculation is correct. Before the fix in this PR, frequency_observed
was ending up incorrect for a block after an imputed block, because rawTime[1]
as assigned here (right after an imputation) was incorrect. But the formula for frequency_observed
itself is fine I think, and as long as rawTime[1]
is correct, the frequency is calculated correctly.
The test file I used to debug this is one of the 6 files you shared with me, an Axivity cwa file with a gap in it, and it's the file with "RT" in the name (let me know if I should email you the full file name or if it's ok to post it here). With this fix it takes about a minute to process that file, but without the fix it takes over an hour and ends up with > 240K of unnecessarily imputed blocks.
(I thought there was one more file with this same issue, but I can't find it, I must have tested with this one file twice in a row without realizing it.)
Merging #49 (31e0238) into main (9f786f6) will not change coverage. The diff coverage is
n/a
.:exclamation: Current head 31e0238 differs from pull request most recent head 43cd47a. Consider uploading reports for the commit 43cd47a to get more accurate results
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
@@ Coverage Diff @@
## main #49 +/- ##
=======================================
Coverage 86.79% 86.79%
=======================================
Files 9 9
Lines 901 901
=======================================
Hits 782 782
Misses 119 119
We were always imputing 201 values instead of whatever number was needed.
We have a couple of test files where this had a particularly bad effect -- there was an odd chunk with abnormally high frequency (twice the expected frequency), and when we imputed it with 200 values instead of ~50 that we should have, we went completely past the next chunk, so that next chunk then ended up with a negative frequency and got imputed as well, and that domino effect continued for over 1000 chunks.