Closed aridyckovsky closed 3 years ago
Any other thoughts on this data caching alternative to avoid re-loading the 2GB dataset of samples each time knitr
is run? There is always the non-evaluation option; we tell knitr
to ignore evaluations until we're ready to share in a more official notebook rendering that includes tables and figures. This is less ideal because it makes our analysis discourse less streamlined.
Thanks for the nudge @aridyckovsky. There are no data confidentiality issues here since the data is anonymized to begin with, so no concerns there. In the past my main approach with data of this type (neuroimaging data has similar issues; loading is nontrivial) is to plan on doing it the painful way, save out what I need, and then thereafter only access the necessary files (not the original). So would a solution that goes file by file, extracts the main x, y, and time data, for example, be sufficient here, or is the issue deeper than this?
That's a great question. I haven't checked what the change in total file size is when using the cleaned data. My assumption is that this issue will persist unless the cleaning decreases data size by a few of orders of magnitude (a few MB of data instead GB), but it's worth seeing if the issue can be minimized just by using relatively smaller data. Either way, this is probably an issue worth fixing pre-sharing, i.e., provide some functional mechanism that lets people cache loaded data if needed.
Agreed. If we're dropping enough columns (and potentially separating data e.g. into the diff. temporal epochs of calibration, validation, task, re-validation), then as long as it works at some point, that may be good enough (b/c the needs here will also be affected by many idiosyncrasies, like the computers doing the analysis, the sampling rate of the eyetracker, etc) for these purposes.
Note that this may be addressed by #18 when complete
Is your feature request related to a problem? Please describe. With eye tracker sampling data up to 2 GB in extracted CSV format, operations that require loading data in chunks can be lengthy and sometimes crash the session.
Describe the solution you'd like Load data from CSV and save as a temporary
.RData
or.Rds
file that can be easily loaded back to a variable during processes like document rendering.Describe alternatives you've considered None.
Additional context Potentially useful: https://rdrr.io/cran/xfun/man/cache_rds.html. We must also determine whether temporary data caching in a local environment violates any participant data confidentiality issues. @psokolhessner could use your feedback here.