opensafely / documentation

Documentation for the OpenSAFELY platform
https://docs.opensafely.org
Other
32 stars 4 forks source link

[WIP] Keeping Git branches in sync with job-server workspaces #1507

Open iaindillingham opened 2 months ago

iaindillingham commented 2 months ago

In the Bristol-Cambridge-Oxford meeting on May 2, 2024 @venexia asked a question about keeping Git branches in sync with job-server workspaces. The question was prompted by an exchange with tech support.^1

We should recognize that it may not be desirable to keep Git branches in sync with job-server workspaces. Nevertheless, this issue captures the question and the exchange with tech support. Ultimately, the intention is to improve the documentation, by making recommendations to researchers.

In the following workflow, the primary branch is GitHub's default branch. It is often called main. The terms primary branch, primary workspace, secondary branch, and secondary workspace have no meaning beyond this issue. HEAD is Git-speak for "the current branch's latest commit".

At this point, the paper is based on primary branch HEAD and files in primary workspace directories.

The paper is reviewed; further analysis is requested, which necessitates modifications to the dataset definition. The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset. ๐Ÿ™

At this point, the paper is based on primary branch HEAD and files in secondary workspace directories.

The paper is reviewed; further analysis is requested ๐Ÿ™

iaindillingham commented 2 months ago

The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset.

I think the user wants to undertake an experiment: that is, to compare the dataset in the primary workspace with the dataset in the secondary workspace. The comparison need not be exact; it may be an approximation. DVC (Data Version Control) provides experiment management, which we could learn from. For more information, see: