Include data management in the template

rostools / prodigenr

Project directory generator R package

https://rostools.github.io/prodigenr/

Other

43 stars 13 forks source link

Include data management in the template #79

Closed jcolomb closed 1 month ago

jcolomb commented 6 years ago

make the data folder not optional, create subfolders:

metadata
raw_data
derived_data
figures_and_tables

add readme files inside explaining what data should go there, and maybe explanation about tidying up the data (create derived_data when the raw data is not tidy).

Note that if may sometimes be better to create the figures first and insert them in the manuscript later, than having chunks in the main manuscript.

lwjohnst86 commented 6 years ago

Thanks @jcolomb for the issue and ideas!

For the data folder not being optional, that's a good idea, though sometimes a project doesn't need data (for instance if you are writing an abstract, with no or minimal numbers/results). But good point, even just creating the README in the data folder to provide some context/explanation.

For the "derived_data", I tend to prefer the devtools::use_data_raw for that purpose, which creates a data-raw folder in the main project folder. But might be a good idea to move that into the data folder though...

For the figures, you are absolutely correct that it might not be good to always create a figure in a code chunk. I'd prefer the figures with the document though (like doc/figures/), not in the data folder.

Thanks again for the comments!! :D

KristijanArmeni commented 6 years ago

Hi,

Including data + metainfo in READMEs is a crucial component for reproducible workflow, though it might be very field-specific. Just as a point of interest (and perhaps inspiration), in human neuroscience community (working with fMRI, EEG etc), there is now an ongoing global initiative to prepare a set of recommendations on how to organize neuroimaging data for them to be easily shared. See: Brain imaging data structure (BIDS)

Now, this is not directly relevant as bids is way too field-specific (and addresses a different issue than this package), but I thought I'd drop this here as it might contain good ideas and impression on the level of detail considered (see e.g. specifications, section 3.4 on definitions of raw vs. derived data and what types of metadata are included in folders).

One corollary of there being BIDS is that there is now a need for tools that organize data folders automatically along the recommendations :)

(I am not familiar what are similar initiatives in other fields of life sciences)

jcolomb commented 5 years ago

Hello,

I indeed think one could use the core of BIDS in the data folder and file naming. It will also open some automation process when saving data (save sourcedata, the corresponding exported raw data, the json or tsv metadata and some derived data in one go would then be possible).

I will look into it and make some code to populate the data folder accordingly (will take some weeks 8 think)

lwjohnst86 commented 1 month ago

I'm minimizing maintenance and scope for this package, so closing this. Plus work on data management tasks has been moved to Seedcase Project