Closed perrette closed 1 year ago
While there are many ways of implementing back-ups and git tracking, the git model of a local, self-contained folder is the most elegant in my opinion. It is easy to keep track of and to cleanup (in contrast to a centralized repo with various branches for various files -- the number of branches would accumulate over time and be hard to maintain).
To avoid double-tracking and conflicts with an existing, larger git repo, it should be possible to simply add a .gitignore
file next to the .papers
directory (or append .papers
to an existing git ignore). And let the user choose whether to git-track or not in the first place. It will initially be opt-in, but could become opt-out if usefulness is greater than other concerns, which I presume will be the case -- reliability is a concern number one when building a bibliography over time.
To let papers
handle git-tracking behind the scenes, any changes to the bibtex (and optionally, to the associated files), have to be mirrored to a specifically dedicated git repo. If file-tracking is activated, the mirrored bibtex cannot be mere copy, but need to maintain its own "file" field pointing to local files. Hard links could be used for files to keep disk usage to a minimum -- at the expanse of Windows user (workarounds, like a copy, could be found later for Windows users).
For a local install, the resulting files structure would look like:
papers.bib => that could be anywhere else
files/ => that could be anywhere else, or be an untidy collection of files
.gitignore => so that no conflict arises with an already git-tracked repo
.papers/
config.json
papers.bib => copy of bibtex with updated file links
files/ => a tidy, renamed version of files
file1.pdf => could be a hard link toward the actual file, to save disk space
...
.git => yet another copy of papers.bib and files + history
.gitattributes => produced by `git lfs track files`
A global install would be pretty much the same, except that a .papers
would be stored in some place globally.
The model outlined above would ensure a solid backup whatever the user configuration. Restoring a previous bibtex would work with that sequence of commands:
cd .papers
git reset --hard HEAD^ # check-out git repo to previous (or any other specific version)
cd ..
rm papers.bib -f
touch papers.bib
papers add .papers/papers.bib --rename --copy
The last line is not a perfect undo
. It does keep track of the files, but it forces rename
.
This example shows that rename
may be a must for git-tracking of files.
The sequence of commands above can be used for undos until the beginning of time, but it cannot be used for redo. Here an alternative sequence for papers undo
, with a hack to keep track of future states (only section between cd .papers
and cd ..
is written below):
echo $(git rev-parse HEAD) >> futures
git reset --hard HEAD^
and for papers redo
:
git reset --hard $(tail -1 futures)
head -n -1 futures > futures.tmp && mv -f futures.tmp futures
Any new modification to the bib would empty futures (no redo after branching out).
Upon saving of the bibliography, the following could work (a more efficient version would be needed to avoid moving around files if not necessary):
rm -rf .papers/papers.bib .papers/files
touch .papers/papers.bib
papers add papers.bib --bibtex .papers/papers.bib --filesdir .papers/files --no-check-duplicate
cd .papers
git add .
git commit -m 'action that triggered the change'
# maybe: git push remote --force
rm -f futures # redo disabled
The model above is some kind of black box that leaves the implementation details to papers
. Alternatively, a simpler, more transparent implementation would involve git tracking in the same, working directory.
papers.bib
files/
.papersconfig.json
.git
.gitattributes
Here plain git
commands would work, without the need to move around bibtex and files each time the bibliography is saved.
.papers
modelWhile I am sensitive to the arguments of simplicity and maintenance, the very last point seems the stronger in favor of a black-box model. Or in favor of dropping the feature altogether. Since this issue is about doing something, let's discuss it further. In case of an already-tracked project repo (which might be common for a local install), the only benefit of the transaprent, same-dir model is to automatize the commit / sync. That could also be address via some kind of hook on savebib, redo, undo (set of commands stored in config file). The black-box model, in contrast, would have a redo/undo system that operates regardless of whether the larger project is handled in git or not.
Now included in release 2.4.
Originally, git tracking feature was added in order to add safety to handling a global papers install. Implementation details are now jeopardized with local install. Local installs are often git-tracked themselves, and nested git repos does not play good. Worse, papers git install might trigger commits to a directory where it is not expected to (fortunately it is off by default, so it still requires explicit user action to be enabled). In the original implementation, the git directory could also be separate from the bibtex file. If that was the case, the bibtex would be copied to the git directory upon saving, and a commit would be done. That works, but using git commands to revert or reset to a previous commit would then only affect the git repo, and not the original bibtex, making the overall behavior unintuitive. Clearly, some overhaul is needed.
While it is not entirely clear to me yet how that feature should evolve. The basic idea of using git to safeguard the bibtex, and undo unwanted changes, is still relevant IMO. Here a few options:
use git as an internal tool in papers, without explicitly asking about it. papers undo (and a new command papers redo) could be used to navigate git history. The git repo would be saved in a central papers dir, using different branches to handle different bibtex locations (using a slug of the full bibtex path as branch name, for instance). That could work even without a proper installation. Maybe. Issue: bibtex rename would break the flow by creating a new branch. We could live with that.
propose
hooks
upon bibtex save. Here a whole workflow could be fine-tuned by users. Could be used as internal to implement higher-level feature.add options to track files, sync with a remote server etc.
For now I'll just leave that issue open to collect ideas. Current simplistic implementation works OK.