wadpac / GGIR

Code corresponding to R package GGIR
https://wadpac.github.io/GGIR/
Apache License 2.0
98 stars 62 forks source link

Need non-wear time for user defined day. #51

Closed laetulos closed 6 years ago

laetulos commented 7 years ago

My group is working with GGIR to reduce data from GENEActive accelerometers. We get through parts 1 and 2. The problem is that we are defining our days as between 5:00 and 23:00 and need the count of valid wear hours or non-wear hours for that time for each day. We are going to use the weighting method in a recent Xu et al paper (https://www.ncbi.nlm.nih.gov/pubmed/27405327) to address non-wear. From part 2 we get the number of valid minutes per day of wear, but it is per 24 hour day and the use of qwindow = c(5,23) does not give us the wear or non-wear time in our desired window. There must be a way to do this?

Thanks, Lee (University of New Mexico)

vincentvanhees commented 7 years ago

Hi Lee, The paper you are referring to is not open access, (edit: I just found a copy on researchgate).

Please note that GGIR already imputes non-wear time by default as described in our publications: For each non-wear time point, it takes the average of all valid data (monitor worn) on similar time points on other days of the measurement. Effectively this is a weighted imputation of average daily acceleration, because days with more valid data will have stronger influence on the overal mean. To me (although i am obviously biased as its developer!) this is a very easy to communicate and motivate approach. It uses all available information in the accelerometer recording, it is computationally very fast: just matrix summation and division, and it is not specific to any particular outcome measure but works on the underlying metric (signal feature) level data, by which the imputation only has to be done once and not for every outcome.

Note that all intermediate stages of information are directly accessible in the GGIR milestone data as stored in the .RData files and documented, which include epoch by epoch indicators of whether the accelerometer was worn. With this you are able to design your own imputation scheme if you like.

laetulos commented 7 years ago

Thanks for the quick response Vincent. OK, so it looks like I need to mine the RData files to find this info? Will work on that. Is it also possible to get it this information in Part 5 if we specify the day as 5:00 to 23:00 in Part 4? Also, we really appreciate the work you have done on GGIR, your non-wear scores appear to be especially compelling, way better than what my teams have used before to define non-wear!

My note, as a methodologist who has spent a lot of time dealing with missing data and with missing accelerometer data, is that simply replacing non-wear points with points from similar days is problematic for all of the reasons that using single imputation is problematic. The easiest example is this: if someone just meets your minimum wear criteria (say 3 days), you then simply use their values over those 3 days. For someone who wore the accelerometer for the entire week, there is no imputation of missing data. Thus, in your analyses you are treating these two individuals as if their values are equal when infact you have much more information for the person who wore the device for the entire week. The repeated measures weighting approach will treat those two subjects very differently as parameter estimates at level-2 are adjusted for the precision of the level-1 data (how many data points each person has). I guess an important point here is that the weighted approach doesn't use week level means, instead it is a repeated measures (multilevel) model which is based on day-level data. This is why it ends up differing substantially from your imputation approach, because the method allows an individual who has 3 days of data to be included but gives more weight to the person with 7 days of data. It also allows for days with less than complete data to be included, but gives those days less weight. In fact, there is no problem including individuals with 1 day of data, we aren't planning to drop anyone based on minimum wear times.

It sounds semantic, but failing to address the uncertainty caused by non-wear has an important problem: you have a lot more unreliability for people with lots of non-wear, that should decrease the size of the effects of predictors of PA as well as increase the uncertainty in those effects. Of course, the effects of this depend a lot on the particular situation, but in the paper referenced above you can see the impact of the 'naive approach' on the precision of parameter estimates. I guess what I'm arguing is that the naive approach (just using data which meets criteria) and filling in values from similar times of day are very similar from a statistical perspective and less than optimal. We have previously spent a lot of time using multiple imputation and I'm not sure it is worth the time cost, but using a weighted longitudinal approach is quite simple and has the advantage of being a principaled statistical approach for addressing non-wear.

Best, Lee

vincentvanhees commented 7 years ago

Thanks for explaining Lee, it sounds like I will have to carefully read that paper then! I have a bit of experience with multi-level modelling, but don't consider myself an expert. If you think it is worth expanding GGIR with the repeated measures weighting approach then I am happy to facilitate.

You will need to look out for object rout in the part2 output see also the tutorial: https://cran.r-project.org/web/packages/GGIR/GGIR.pdf As the documentation says rout holds binary indicators per timestamp of whether the data can be trusted or not. Let me know if anything is unclear. Those timestamps should then match with the timestamps in part5 output.

I agree that it may be easier to have time series including non-wear indicators stored as part of the part5 output, together with all the level labels or to have some kind of export function that exports all these time series to csv files. I just haven't got round building something like that yet.

vincentvanhees commented 7 years ago

Correction: rout does not have timestamps, but is aligned with IMP$metalong the less dense indicator of non-wear time (by default 15 minutes but adjustable). So, first you would have to match rout to the timestamps of the metalong dataframe inside the IMP or M object.

laetulos commented 7 years ago

Thanks for this Vincent,

we will work on it to see what we can get working with our data. The nice thing about this approach is that once the data is in the right format the model is quite simple. I have a post doc working on this in another project and our preliminary results look pretty good, some example code is below, once we figured out the right way to include the weights it is quite painless to estimate. The challenge is to get the data output in the right format, we need a 'stacked' file with one row per time period of interest (if modeling PA per day, each person needs a row for each day they were observed). We can get this already from GGIR, all we need to do is add the proportion of non-wear time associated with each row. Once we have something I'll try to remember to post code here to help you and anyone else trying to do this. Unfortunately I don't have enough programming time for this project, what else is new?

By the way, if interested, here is a link to a paper that a former grad student did with some other more interesting uses of multilevel analysis of accelerometry data (https://www.researchgate.net/publication/265388428_A_multilevel_approach_to_examining_time-specific_effects_in_accelerometer-assessed_physical_activity). Basically, she showed that our power to look at intervention effects at specific time points was far greater than for looking at intervention effects on average, there are lots of applications beyond RCTs.

Best! Lee

Code for Multilevel model:

(most variables are obvious, FID is the child ID variable)

MVPA.C<-lme(SQ.C.MVPA ~ income.c + Married + Child_male, random = ~1|FID, data = PA.BL, na.action = na.exclude, method = 'REML')

Updating with the sampling weights: C.Prop i the proportion of missing data for that time period.

MVPA.C.Weight<-update(MVPA.C, data = PA.BL, method='REML', weights=varFixed(~1/(1-C.Prop)))

vincentvanhees commented 7 years ago

Thanks Lee, It may be worth adding that I did use multi-level modelling to account for person/night level missing data in one study: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0142533. So, I worked with the night specific sleep estimates (27 thousand nights in 4000 individuals), and used multi-level modelling to account for the variation between individuals and within individuals (between nights). Before running the analyses I omitted all days with insufficient monitor wear time (monitor worn for < 66% of the day). This then allowed the multi-level model to put more weight on individuals with more valid days of data. It is better than standard single level regression, but I agree that this approach indeed does not account well for within day data availability. So, in my case a day with 22 hours of data + 2 imputed hours weighted equal compared with a day with 24 hours of data. I can see why that is not ideal and look forward seeing your results!