Open iaindillingham opened 2 months ago
The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset.
I think the user wants to undertake an experiment: that is, to compare the dataset in the primary workspace with the dataset in the secondary workspace. The comparison need not be exact; it may be an approximation. DVC (Data Version Control) provides experiment management, which we could learn from. For more information, see:
In the Bristol-Cambridge-Oxford meeting on May 2, 2024 @venexia asked a question about keeping Git branches in sync with job-server workspaces. The question was prompted by an exchange with tech support.^1
We should recognize that it may not be desirable to keep Git branches in sync with job-server workspaces. Nevertheless, this issue captures the question and the exchange with tech support. Ultimately, the intention is to improve the documentation, by making recommendations to researchers.
In the following workflow, the primary branch is GitHub's default branch. It is often called
main
. The terms primary branch, primary workspace, secondary branch, and secondary workspace have no meaning beyond this issue.HEAD
is Git-speak for "the current branch's latest commit".HEAD
and files in primary workspace directoriesAt this point, the paper is based on primary branch
HEAD
and files in primary workspace directories.The paper is reviewed; further analysis is requested, which necessitates modifications to the dataset definition. The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset. ๐
HEAD
and files in secondary workspace directoriesAt this point, the paper is based on primary branch
HEAD
and files in secondary workspace directories.The paper is reviewed; further analysis is requested ๐
Should the researcher commit to primary branch? Files in primary workspace directories are behind files in secondary workspace directories. The researcher would need to run jobs in primary workspace.
Should the researcher branch from primary branch, giving new secondary branch with same name as old secondary branch, and commit to new secondary branch? The researcher would not need to run jobs in secondary workspace.