paulhibbing / AGread

Read Accelerometer Files from ActiGraph Accelerometers
Other
15 stars 4 forks source link

read_gt3x() fails with Error: vector memory exhausted (limit reached?) for 21.0 MB gt3x file #11

Closed martakarass closed 4 years ago

martakarass commented 4 years ago

I have a gt3x file and corresponding ActiLife raw data output CSV which correspond to ~49h of data collection at fs=100Hz.

I try to use AGread::read_gt3x to read the raw data from gt3x file, but it fails with the message Error: vector memory exhausted (limit reached?), see the full verbose output below. I am wondering if anything can be done to be able to read that size of data?

Download GT3X and corresponding ActiLife raw data output CSV

rm(list = ls())
library(data.table)
#  devtools::install_github("paulhibbing/AGread")
library(AGread)
packageVersion("AGread")
# [1] ‘1.1.0.9000’

## Personal dropbox sharing links to (1) GT3X file, (2) ActiLife raw data output CSV
gt3x.fpath <- "https://www.dropbox.com/s/w3u1rg5lgx4xjbm/TAS1E23150400%20%282018-03-28%29.gt3x?dl=1"
csv.fpath <- "https://www.dropbox.com/s/435cfcze02a14dj/TAS1E23150400%20%282018-03-28%29RAW.csv?dl=1"

file_directory <- getwd()
gt3x.destfile <- file.path(file_directory, "TAS1E23150400(2018-03-28).gt3x")
csv.destfile <- file.path(file_directory, "TAS1E23150400(2018-03-28).csv")

## Download files to wd
if (!file.exists(gt3x.destfile)) download.file(gt3x.fpath, gt3x.destfile)
# downloaded 21.0 MB

if (!file.exists(csv.destfile)) download.file(csv.fpath, csv.destfile)
# downloaded 335.1 MB

Read ActiLife raw data output CSV

as.character(unlist(read.csv(file = csv.destfile, nrows = 6)))
# [1] "Serial Number: TAS1E23150400"    
# [2] "Start Time 13:15:00"             
# [3] "Start Date 3/28/2018"            
# [4] "Epoch Period (hh:mm:ss) 00:00:00"
# [5] "Download Time 14:18:42"          
# [6] "Download Date 3/30/2018" 

dat_actilife <- as.data.frame(fread(csv.destfile))

## Expected vs actual number of observations in raw data CSV Actilife output 
hz <- 100 
collection_dur_s <- difftime(as.POSIXct("2018-03-30 14:18:42"),
                             as.POSIXct("2018-03-28 13:15:00"), 
                             units = "secs")
collection_dur_s <- as.numeric(collection_dur_s)
nrow_exp <- hz * collection_dur_s
nrow_act <- nrow(dat_actilife)
c(nrow_exp, nrow_act) 
# [1] 17662200 17662200

nrow_act / (hz * 60 * 60) ## Hours of data
# 49.06167

Read GTX3 file with AGread::read_gt3x

t1 <- Sys.time()
obj_read_gt3x <- read_gt3x(gt3x.destfile, verbose = TRUE)
# 
# Processing TAS1E23150400 (2018-03-28).gt3x 
# 
# Parsing info.txt  ............. COMPLETE
# 
# Will parse the following packet types, if available:
#   METADATA, PARAMETERS, SENSOR_SCHEMA, BATTERY 
# EVENT, TAG, ACTIVITY, HEART_RATE_BPM 
# HEART_RATE_ANT, HEART_RATE_BLE, LUX, CAPSENSE 
# EPOCH, EPOCH2, EPOCH3, EPOCH4 
# ACTIVITY2, SENSOR_DATA 
# 
# Reading log.bin  ............. COMPLETE
# Getting record headers............... COMPLETE
# Parsing PARAMETERS packet(s)   ............. COMPLETE                      
# Parsing list packet(s)   ............. COMPLETE                      
# Parsing METADATA packet(s)   ............. COMPLETE                      
# Parsing BATTERY packet(s)   ............. COMPLETE                      
# Parsing CAPSENSE packet(s)   ............. COMPLETE                      
# Checking for gaps in the time series. Fixing if found.Error: vector memory exhausted (limit reached?)

t2 <- Sys.time()
t2-t1
# Time difference of 4.157571 mins
Session info ```r devtools::session_info() # ─ Session info ─────────────────────────────────────────────────────────────────────── # setting value # version R version 3.5.2 (2018-12-20) # os macOS Mojave 10.14.2 # system x86_64, darwin15.6.0 # ui RStudio # language (EN) # collate en_US.UTF-8 # ctype en_US.UTF-8 # tz America/New_York # date 2019-09-25 # # ─ Packages ─────────────────────────────────────────────────────────────────────────── # package * version date lib source # AGread * 1.1.0.9000 2019-09-25 [1] Github (paulhibbing/AGread@627b6d2) # anytime 0.3.6 2019-08-29 [1] CRAN (R 3.5.2) # assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.2) # backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.2) # binaryLogic 0.3.9 2017-12-13 [1] CRAN (R 3.5.0) # callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.2) # cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.2) # colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.2) # crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0) # data.table * 1.12.2 2019-04-07 [1] CRAN (R 3.5.2) # desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0) # devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.2) # digest 0.6.21 2019-09-20 [1] CRAN (R 3.5.2) # dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.5.2) # fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.2) # ggplot2 3.2.1 2019-08-10 [1] CRAN (R 3.5.2) # glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.2) # gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.2) # lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.2) # lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.0) # magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0) # memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0) # munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0) # PAutilities 0.2.0 2019-07-10 [1] CRAN (R 3.5.2) # pillar 1.4.2 2019-06-29 [1] CRAN (R 3.5.2) # pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.2) # pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.5.2) # pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0) # plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.0) # prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0) # processx 3.4.1 2019-07-18 [1] CRAN (R 3.5.2) # ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0) # purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.2) # R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2) # Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.2) # remotes 2.0.4 2019-04-10 [1] CRAN (R 3.5.2) # reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.0) # rlang 0.4.0 2019-06-25 [1] CRAN (R 3.5.2) # rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0) # rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.2) # scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.0) # sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0) # stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.2) # stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2) # testthat 2.1.1 2019-04-23 [1] CRAN (R 3.5.2) # tibble 2.1.3 2019-06-06 [1] CRAN (R 3.5.2) # tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.0) # usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.2) # withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0) # yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0) # # [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library ```

FYI: @muschellij2

paulhibbing commented 4 years ago

Thanks for providing such helpful feedback here and elsewhere, @martakarass and @muschellij2. The fix I just pushed works for me on the file you shared (as well as previous files).

My approach to filling in "missing" packets was really clumsy in R (and probably would have been clumsy in C++ too). I've reworked things and re-routed through Rcpp, so it should create less memory havoc and run much more smoothly.

FYI, you may experience more failures while I'm working on #12. My hope is that won't take long to figure out, but I've been wrong before...