perrette / papers

Command-line tool to manage bibliography (pdfs + bibtex)
MIT License
146 stars 22 forks source link

Backup, Sync and git tracking #51

Closed perrette closed 1 year ago

perrette commented 1 year ago

Originally, git tracking feature was added in order to add safety to handling a global papers install. Implementation details are now jeopardized with local install. Local installs are often git-tracked themselves, and nested git repos does not play good. Worse, papers git install might trigger commits to a directory where it is not expected to (fortunately it is off by default, so it still requires explicit user action to be enabled). In the original implementation, the git directory could also be separate from the bibtex file. If that was the case, the bibtex would be copied to the git directory upon saving, and a commit would be done. That works, but using git commands to revert or reset to a previous commit would then only affect the git repo, and not the original bibtex, making the overall behavior unintuitive. Clearly, some overhaul is needed.

While it is not entirely clear to me yet how that feature should evolve. The basic idea of using git to safeguard the bibtex, and undo unwanted changes, is still relevant IMO. Here a few options:

For now I'll just leave that issue open to collect ideas. Current simplistic implementation works OK.

perrette commented 1 year ago

While there are many ways of implementing back-ups and git tracking, the git model of a local, self-contained folder is the most elegant in my opinion. It is easy to keep track of and to cleanup (in contrast to a centralized repo with various branches for various files -- the number of branches would accumulate over time and be hard to maintain).

To avoid double-tracking and conflicts with an existing, larger git repo, it should be possible to simply add a .gitignore file next to the .papers directory (or append .papers to an existing git ignore). And let the user choose whether to git-track or not in the first place. It will initially be opt-in, but could become opt-out if usefulness is greater than other concerns, which I presume will be the case -- reliability is a concern number one when building a bibliography over time.

To let papers handle git-tracking behind the scenes, any changes to the bibtex (and optionally, to the associated files), have to be mirrored to a specifically dedicated git repo. If file-tracking is activated, the mirrored bibtex cannot be mere copy, but need to maintain its own "file" field pointing to local files. Hard links could be used for files to keep disk usage to a minimum -- at the expanse of Windows user (workarounds, like a copy, could be found later for Windows users).

For a local install, the resulting files structure would look like:

 papers.bib        => that could be anywhere else
 files/            => that could be anywhere else, or be an untidy collection of files
.gitignore         => so that no conflict arises with an already git-tracked repo
.papers/
    config.json
    papers.bib     => copy of bibtex with updated file links
    files/         => a tidy, renamed version of files
        file1.pdf  => could be a hard link toward the actual file, to save disk space
        ...
    .git            => yet another copy of papers.bib and files + history
    .gitattributes  => produced by `git lfs track files`

A global install would be pretty much the same, except that a .papers would be stored in some place globally.

perrette commented 1 year ago

The model outlined above would ensure a solid backup whatever the user configuration. Restoring a previous bibtex would work with that sequence of commands:

cd .papers
git reset --hard HEAD^   # check-out git repo to previous (or any other specific version)
cd ..
rm papers.bib -f
touch papers.bib
papers add .papers/papers.bib --rename --copy

The last line is not a perfect undo. It does keep track of the files, but it forces rename. This example shows that rename may be a must for git-tracking of files.

The sequence of commands above can be used for undos until the beginning of time, but it cannot be used for redo. Here an alternative sequence for papers undo, with a hack to keep track of future states (only section between cd .papers and cd .. is written below):

echo $(git rev-parse HEAD) >> futures
git reset --hard HEAD^

and for papers redo:

git reset --hard $(tail -1 futures)
head -n -1 futures > futures.tmp && mv -f futures.tmp futures

Any new modification to the bib would empty futures (no redo after branching out).

perrette commented 1 year ago

Upon saving of the bibliography, the following could work (a more efficient version would be needed to avoid moving around files if not necessary):

rm -rf .papers/papers.bib .papers/files    
touch .papers/papers.bib
papers add papers.bib --bibtex .papers/papers.bib --filesdir .papers/files --no-check-duplicate
cd .papers
git add .
git commit -m 'action that triggered the change'
# maybe: git push remote --force
rm -f futures   # redo disabled
perrette commented 1 year ago

The model above is some kind of black box that leaves the implementation details to papers. Alternatively, a simpler, more transparent implementation would involve git tracking in the same, working directory.

papers.bib
files/
.papersconfig.json
.git
.gitattributes

Here plain git commands would work, without the need to move around bibtex and files each time the bibliography is saved.

Pros of black-box, .papers model

Pros and contras of transparent, same-dir model

While I am sensitive to the arguments of simplicity and maintenance, the very last point seems the stronger in favor of a black-box model. Or in favor of dropping the feature altogether. Since this issue is about doing something, let's discuss it further. In case of an already-tracked project repo (which might be common for a local install), the only benefit of the transaprent, same-dir model is to automatize the commit / sync. That could also be address via some kind of hook on savebib, redo, undo (set of commands stored in config file). The black-box model, in contrast, would have a redo/undo system that operates regardless of whether the larger project is handled in git or not.

perrette commented 1 year ago

Now included in release 2.4.