joostkremers / ebib

A BibTeX database manager for Emacs.
https://joostkremers.github.io/ebib/
BSD 3-Clause "New" or "Revised" License
272 stars 37 forks source link

Save changes from main database to dependent database #269

Open jsilve24 opened 1 year ago

jsilve24 commented 1 year ago

I am just starting to play around with ebib and consider integrating it into my workflow (so I am not an expert by any means).

One issue I came across is how to save changes from a main database back into the dependent database (i.e., update the dependent database after changes to the main database).

As expected, if I change something in the main database file then I open the dependent database, I see the changes from master. Yet actually getting those changes to be written back to the dependent database file (so that citations using the dependent database file are correct) is not simple. I would have expected that ebib-save-current-database would yhave done this but it doesn't. Seems I need to call ebib-write-database.

Am I missing something? Seems like it should be way easier to update the dependent database, not just show the updates in the ebib interface. The later is not used when actually building latex documents.

joostkremers commented 1 year ago

What editing operation exactly are you performing on the main database? The problem is probably that in certain cases, the dependent database doesn't get marked as modified, even though it should. Since the data between the main and dependent database are shared (i.e., each entry only exists once in memory), any change to an entry is reflected in both automatically, but if the dependent isn't marked as modified, Ebib will normally not save it.

As a workaround, you should be able to use a prefix argument on the save command (i.e., type C-u s instead of just s) to force Ebib to save the database, even if it hasn't been modified.

jsilve24 commented 1 year ago

So here is a reproducible example with a reproducible example. Note my entire ebib configuration is the line (use-package ebib).

Create a new file foo.bib

@Article{george55:_why_boy_georg,
    journal = {Annals of Statistics},
    year = {1655},
    title = {Why is Boy George writing an accademic paper? },
    author = {george, boy}
}

Open this database in Ebib. Then create a new dependent database in the same directory entitled: foo-dependent.bib here is what that file looks like for me:

@Comment{
ebib-main-file: /home/jds6696/Downloads/ebib-example/foo.bib
}

@Article{george55:_why_boy_georg,
    journal = {Annals of Statistics},
    year = {1655},
    title = {Why is Boy George writing an accademic paper? },
    author = {george, boy}
}

Observation:

Sounds like this is a bug?

joostkremers commented 1 year ago

So if I understand correctly, you're opening the main database, make a change that affects a dependent database, and you open the dependent after making that change.

That's actually a scenario that never crossed my mind... I tend to open the dependent database, which causes Ebib to also load the main one. (I rarely have more than one dependent database at a time.)

To fix this, we need to check if the main database is modified when a dependent database is opened. That basically involves checking if it's marked as modified in Ebib and if not, checking whether the modification time of the file is more recent than the modification time of the dependent database.

Of course, an edit in the main database doesn't necessarily affect the dependent, because the changes may have been done in entries that are not part of the dependent. But since we don't keep track of which entries were modified, we have no way of telling which dependents are affected and which are not. So the safest thing to do is to mark the dependent as modified and offer to save it.

jsilve24 commented 1 year ago

I think that would be a good idea to add. At least personally, here is my use case:

I am an accademic with multiple graduate students each writing various papers. I am tired of constantly having to track down bibtex entries for papers I have cited repeatedly in prior articles. So basically I want to have my own master bibtex database and each paper has its own. Students will likely make updates to individual papers bibtex files outside of ebib (I am the only one using emacs). I will likely make changes directly to the master file or have changes from one papers bibtex propogate to my master file and then need to propogate back out to other students papers sometimes.

So beyond syncing changes to main to the dependent databases I am also going to have to come up with some solution for merging changes from dependent databases made outside of ebib into the master file.... Not sure if you have any ideas on that (or if there is already functionality). I was thinking of trying to create a ediff based approach to manually merge when there has been a change to a depenedent database that conflicts that is more recent than the changes made in my master file. Something like that... Open to ideas.

joostkremers commented 1 year ago

I think that would be a good idea to add.

I'll add it as soon as I find some time. Can't make any promises on an ETA, though.

So beyond syncing changes to main to the dependent databases I am also going to have to come up with some solution for merging changes from dependent databases made outside of ebib into the master file.... Not sure if you have any ideas on that (or if there is already functionality).

No, there's no such functionality. In fact, if a dependent database is modified outside of Ebib, those changes are lost once you open the file in Ebib and save it again. The reason is that even though the dependent database has the full BibTeX entries, when you open it in Ebib, Ebib just reads the entry keys and gets all the field values from the main database.

I was thinking of trying to create a ediff based approach to manually merge when there has been a change to a depenedent database that conflicts that is more recent than the changes made in my master file. Something like that... Open to ideas.

That's a tough one... Initially I thought of a version control system such as Git, but that wouldn't handle merging changes in a dependent file into the main file.

Ebib has the option to merge one .bib file into another, but it doesn't overwrite existing entries. So if a there are changes to an entry in a dependent database, those changes are not merged into the main file.

It probably wouldn't be difficult to add an option to overwrite existing entries when merging a .bib file into another, though. Then, the way to incorporate changes from a dependent database would be:

  1. open the main db;
  2. merge the dependent db, overwriting duplicate entries;
  3. open the dependent db.

When merging a .bib file into the main db, Ebib doesn't check whether it's a dependent or not. So new entries in the dependent would be added to the main db and changed entries would overwrite the entries in the main db. Then, when you open the dependent db, Ebib uses the data in the main db, which has now been updated.

There's one problem, though: if an entry was changed in the main db and in a dependent db, the changes in the main db would be lost. And since Ebib would replace the entire entry, not just single field values, this would even happen if the changes were made to different fields (and thus wouldn't be contradicting).

Same thing if two dependent files make changes to the same BibTeX entry: when you merge the second one, the changes made by the first one are lost.

Perhaps you could use Git to manage this, though. You'd have to keep your main .bib file under version control and every time you merge a dependent database, you use Git to review the changes and accept or reject them. (I would suggest using Magit, because it's a great front-end and because it makes it easy to accept or reject even single lines from a change.) That way you'll be able to tell if merging a dependent file made changes to the main file that you do not want and you can tell Git to throw them away. Afterwards, you'd have to reload the main file in Ebib, but there's a command for that.

Honestly, what you describe sounds like you're really looking for a database server/client model, where multiple clients can connect to a single database. I don't know if something like Zotero offers that, but perhaps it's worth a look. But the Git approach might just work if you make sure to regularly commit changes to the main file.