ocean-transport / science-together

MIT License
0 stars 0 forks source link

Science Together - How to design and execute a collaborative research project

This repository is meant as an evolving guide/log of the collaborative research untertaken within the Climate Data Science Lab (CDSLab).

We hope that this living document can serve as inspiration to other groups, who want to try to try alternative ways of conducting science, and hope that a variety of feedback will continously improve it.

Why?

TBW

How?

To execute a collaborative project we need to define how to organize the main components of a research project: Work and Code/Data

The Work Structure determines when to have meetings, what to define as milestones and as a bonus how to celebrate successful milestones. The Code/Data Structure sets rules on how data needed for the particular project is generated, processed, and checked.

Work Structure

As an initial experiment we will try to organize our work around fixed work intervals (often called 'sprints').

A sprint will last 2 weeks and start and end with a synchronous meeting.

Within each sprint, members are expected to commit a certain amount of their weekly hours to the sprint, but are free to organize when. This leaves freedom to accomodate different work styles (e.g. an hour every other day vs a full day hack).

Open Questions

Code/Data Structure

We believe that most modern science projects consist at the core out of code which generates and analyzes data.

The basic building blocks of a 'science project' in this context are:

  1. Data
  2. Code
  3. Publications (paper, blogpost, report)

Lets start with a very simplistic project

drawing

In this case the code in the repository generates some figures from the data, combines it with some text and we have a paper 🤗.

The reality of most science projects is not that simple. In many cases to get to a published result, projects depend on several datasets, require heavy processing (often creating intermediate data in the process) until the final paper can be written up. Furthermore these intermediate data might actually be used in several papers. It is thus useful to separate the concept of a 'paper'-repository from a 'project'-repository:

drawing

These two types of repositories have very different requirements:

How to build a 'project' repository (WIP)

We assume that all 'raw' data in this project is already [ARCO]() data. Working with local files on a supercomputer likely leads to different requirements, which we will not consider here.

The key task of the project repo is to represent and execute some sort of pipeline which transforms data in one or more stages (steps).

There are a variety of workflow 'engines' available. Let's try to outline our requirements and make an informed decision about which system best suits our needs.

Feature wishlist

Cornerstones

Possible workflow tools