Common Data Quality Expectations

iqis commented 5 years ago

Hi @wendtke , as we've talked on the phone, it seems a valuable proposition to take some of the data QA work into the package.

Could you please make a short list of common expectations, starting with those we've talked about?

For example:

HRV Stats : Respiration Peak Frequency is expected to be within the range of Settings : HF/RSA Frequency Band
HRV Stats : Segment Duration expected to have the consistent value, also same with Settings : Segment Time

wendtke commented 5 years ago

Yes, let's work these filters into psyphr. Is it also possible to provide users with the number of segments lost due to each of these criterion for easy reporting?

For HRV, the criteria are quite clear:

Each segment must be at least 30 seconds in length in order to derive respiratory sinus arrhythmia (RSA). If a segment is shorter than 30 seconds, RSA will not be available (black space). Common segment length selections include 30 seconds, 1 minute, 5 minute, etc. Technically, the segment can be of any duration; however, RSA cannot be derived from segments shorter that 30 seconds nor should researchers use segments of varying lengths (e.g., 30 seconds in one task and 1 minute in the other) within one study.
If a researcher estimates more than 10 percent of R-peaks (heart beats) within a given segment, the segment should be excluded from analyses. For example, in a typical 30-second segment, you might have 40 R-peaks; if more than 4 R-peaks are estimated (i.e., the R-peak marker moved to estimate inter-beat interval), the segment should not be included in analyses. This 10 percent rule is commonly accepted in the heart rate variability literature. More information here.
Respiration rate and respiratory peak frequency (related measures) must be within expected range per individual's age range and study conditions (wider range if exercise involved). Respiration rate and respiratory peak frequency on the HRV Stats sheet can be cross-referenced with the frequency bands listed on the Settings sheet. Some researchers look across frequency bands (very low to high/RSA), so we will want to include the full range (3 rows on Settings sheet). See here for more information.

The EDA literature, unfortunately, does not provide clear quality control guidelines. I am still looking into this. I have a few more resources to check.

iqis commented 5 years ago

Thanks for the examples, which are very specific and detailed. I will find an appropriate time to implement them. Currently I'm looking at using perhaps a combination of the data frame-specific assertr or the more general assertthat package. Any thoughts？

wendtke commented 5 years ago

I don't think I know enough about assertr and assertthat to provide thoughts. Let's chat more about it.

iqis commented 5 years ago

I'm not an expert on these, either. At leaset assertthat is positioned by the author to be a replacement of base::stopifnot(), so it must be very simple.

On Mon, Jun 24, 2019 at 7:46 PM Kathleen Wendt notifications@github.com wrote:

I don't think I know enough about assertr and assertthat to provide thoughts. Let's chat more about it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wendtke/psyphr/issues/13?email_source=notifications&email_token=AKE6JFWSB52ZQWG5OB57A3LP4FMEDA5CNFSM4HJ35OT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYORJBI#issuecomment-505222277, or mute the thread https://github.com/notifications/unsubscribe-auth/AKE6JFXTLD6XI3NKGKV55WLP4FMEDANCNFSM4HJ35OTQ .

wendtke commented 5 years ago

I am hoping @MalloryJfeldman can help us with suggested scoring/editing approaches for the other data types.

wendtke commented 5 years ago

See google doc for guidelines per data type and ideas for implementation within psyphr.

wendtke / psyphr

Common Data Quality Expectations #13