Report on case where checksum failed for .cwa file

vincentvanhees commented 11 months ago

I am not sure this is an issue, but want to share my observation just in case others identify a problem:

Part 2 summary results

These look good as it shows the expect 7 days of data with almost no non-wear time:

variable	value
samplefreq	100
device	axivity
clipping_score	0
meas_dur_dys	6.99
complete_24hcycle	1
meas_dur_def_proto_day	6.99
wear_dur_def_proto_day	6.375
calib_err	0.004
calib_status	recalibration done, no problems detected
ENMO_fullRecordingMean	24.442

Data quality report

In data_quality_report.csv we see for this recording:

variable	value
filehealth_totimp_min	0.118365
filehealth_checksumfail_min	56163.11
filehealth_niblockid_min	0.118365
filehealth_fbias0510_min	0
filehealth_fbias1020_min	0
filehealth_fbias2030_min	0
filehealth_fbias30_min	0.118365
filehealth_totimp_N	6
filehealth_checksumfail_N	4
filehealth_niblockid_N	6
filehealth_fbias0510_N	0
filehealth_fbias1020_N	0
filehealth_fbias2030_N	0
filehealth_fbias30_N	6

The checksum failed for 4 blocks apparently representing a large time window (see filehealth_checksumfail_min), however the actually imputed time (see filehealth_totimp_min) is less than a minute.

QClog object

Inspection of the QClog stored in GGIR part1 milestone object M$QClog shows:

check_quality_specific_file_cropped

which confirms that checksum failed four times.

Interpratation

My interpretation for now:

The code deals with faulty blocks correctly as total recording duration is as expected.
Reporting total time imputed when checksum failed filehealth_checksumfail_min may not be that informative as timestamps for these blocks are not trustworthy.
The filehealth_totimp_min is still correct and still our best indicator of file health.

Unfortunately, I do not have access to the actual cwa file for this at the moment as a project partner runs the analysis, but I will try to find a way to investigate it. I have not been able to reproduce the issue with any of the cwa test files I have access to myself.

l-k- commented 10 months ago

@vincentvanhees I hope it's ok, I looked at this scenario for a bit.

I think you are right, and in this specific scenario the faulty blocks were imputed correctly. As far as I can tell from reading the code, blocks 581-584 actually weren't imputed (or rather, were imputed but with 0 points) because their end points were earlier in time than the last valid time point (the start of block 581). The first block to actually get imputed is 585, and it got imputed over the time period of [start of 581, end of 585], which is exactly what we would like.

And I think filehealth_totimp_min also ends up correct here.

But I think that there can be situations with failed checksums and incorrect timestamps where things don't work out smoothly like this. For instance, instead of having a timestamp in the past, a faulty block (with a failed checksum) could have a timestamp several days in the future. In this case, we will end up imputing to this incorrect future timestamp, even though ideally we should have just skipped the faulty block and waited to impute till we encounter a block with a timestamp we can trust.

I'll take a look at this a bit more today and I will send a PR with a possible change.

Do you by any chance have a .cwa with a failed checksum in it, that I could use for testing? It doesn't matter to me if all the timestamps are correct or incorrect in the faulty blocks

vincentvanhees commented 10 months ago

Thanks for looking into it. Unfortunately, I do not have a file with a failed checksum among the files I shared with you. The file above I cannot access myself. I have a slightly larger pool of cwa files on my PC, and will process them in the weekend to see whether any of them happen to have failed checksums.

vincentvanhees commented 10 months ago

@danielgjackson do you have an example .cwa file that has blocks with failed checksum? If yes, would you be able to share this with us?

danielgjackson commented 10 months ago

I don't have one, but it would be easy to create by changing a byte in a data block (any 512-byte aligned block, after the initial 1024-byte header).

l-k- commented 10 months ago

That's such a good point, thanks @danielgjackson!

wadpac / GGIRread