Closed vincentvanhees closed 10 months ago
@vincentvanhees I hope it's ok, I looked at this scenario for a bit.
I think you are right, and in this specific scenario the faulty blocks were imputed correctly. As far as I can tell from reading the code, blocks 581-584 actually weren't imputed (or rather, were imputed but with 0 points) because their end points were earlier in time than the last valid time point (the start of block 581). The first block to actually get imputed is 585, and it got imputed over the time period of [start of 581, end of 585], which is exactly what we would like.
And I think filehealth_totimp_min
also ends up correct here.
But I think that there can be situations with failed checksums and incorrect timestamps where things don't work out smoothly like this. For instance, instead of having a timestamp in the past, a faulty block (with a failed checksum) could have a timestamp several days in the future. In this case, we will end up imputing to this incorrect future timestamp, even though ideally we should have just skipped the faulty block and waited to impute till we encounter a block with a timestamp we can trust.
I'll take a look at this a bit more today and I will send a PR with a possible change.
Do you by any chance have a .cwa with a failed checksum in it, that I could use for testing? It doesn't matter to me if all the timestamps are correct or incorrect in the faulty blocks
Thanks for looking into it. Unfortunately, I do not have a file with a failed checksum among the files I shared with you. The file above I cannot access myself. I have a slightly larger pool of cwa files on my PC, and will process them in the weekend to see whether any of them happen to have failed checksums.
@danielgjackson do you have an example .cwa file that has blocks with failed checksum? If yes, would you be able to share this with us?
I don't have one, but it would be easy to create by changing a byte in a data block (any 512-byte aligned block, after the initial 1024-byte header).
That's such a good point, thanks @danielgjackson!
I am not sure this is an issue, but want to share my observation just in case others identify a problem:
Part 2 summary results
These look good as it shows the expect 7 days of data with almost no non-wear time:
Data quality report
In data_quality_report.csv we see for this recording:
The checksum failed for 4 blocks apparently representing a large time window (see
filehealth_checksumfail_min
), however the actually imputed time (seefilehealth_totimp_min
) is less than a minute.QClog object
Inspection of the QClog stored in GGIR part1 milestone object M$QClog shows:
which confirms that checksum failed four times.
Interpratation
My interpretation for now:
filehealth_checksumfail_min
may not be that informative as timestamps for these blocks are not trustworthy.filehealth_totimp_min
is still correct and still our best indicator of file health.Unfortunately, I do not have access to the actual cwa file for this at the moment as a project partner runs the analysis, but I will try to find a way to investigate it. I have not been able to reproduce the issue with any of the cwa test files I have access to myself.