What are the advantages RDM brings to data analyis

Status: published Date: 13.09.2018 Your name: julien colomb Your orcid: NA license to apply to your comment: CC0 your position at the time your life story happened: researcher Input = RDM for data analysis

Four RDM actions to ease data analysis

1: Make your data computer readable

Digital data can be quite easily transformed and analysed using programming language like R and python. While you do not have to learn these languages (yet), knowing what they require in terms or readability might save you time and efforts.

Tabular data/metadata shall be tidy
Keep your primary data (raw data) untouched (i.e. no copy/paste in raw data, NEVER)
If you have many datasets, make sure you are able to automate the file imports. An index of datasets may be a good practical solution.
Separate raw data, derived data, analysis and analysis results in different folders
Make sure to document each step of your analysis.

2: Fit your data format to its analyse (during data collection)

The analysis you will do (the statistics you wanna use as well as the software you will use) might require your data to be in a certain format, it will probably affect how much data you need to come to a robust conclusion and may even affect the number of variables you indeed need to record.

This is especially true for metadata and using an existing standard is easier than transforming what you collecting into that standard afterwards.

3: Plan for the unexpected

The data you collect today may be analysed in 2 years and published in 5. During that time, a lot can happen. People may prove that the analysis you planned is not fitted to your problem, or you may realise that a variable you did not plan to collect is crucial. Maybe a new dataset will appear that you will need to compare your own data to, or new people will help you with your project and need access to your data,...

Plan for your data to be re-usable. At best, get some colleague to watch your data and see if they can understand it. The unexpected may also be good, maybe halfway in your tedious manual analysis, you will discover a way to automatize it. So keep track of links between raw and derived data.

4: Be specific: merging is easier than splitting

When recording variables, be as specific as you can. It is very easy to pool two categories into one but very difficult (and sometimes impossible) to separate a group during the analysis.

Similarly quantitative variables are easier to analyse than qualitative ones. You can always create categories from quantitative indications, not the way around.

As an example, if your question is "does obese mice make longer naps", record the mice weight not its category. Analysing a correlation between weight and length of naps is more powerful than having the two categories.

open-science-promoters / RDM-promotion