retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.31k stars 284 forks source link

Set IDS field when merging references with different citation keys #1221

Closed probably-england closed 5 years ago

probably-england commented 5 years ago

BibLaTeX has the ability to have multiple citation keys refer to the same bibliography entry by setting the IDS field in the .bib file to a comma separated list of the additional keys you want to refer to the entry. It would be great if BBT would automatically remember the non-primary ids of merged items, allow editing them, and write them to the IDS field when exporting BibLaTex. One use case for this is when merging two documents with different .bib files which have large overlap in the papers cited, but use different citation keys. Without this, one has to go through one of the documents manually to change all the keys to match the keys used in the other document.

label-gun[bot] commented 5 years ago

It looks like you did not upload an error report. The error report is important; it gives @retorquere your current BBT settings and a copy of the problematic reference as a test case so he can best replicate your problem. Without it, @retorquere is effectively blind.

Once done, you will see a debug ID in red. Please post that debug id in the issue here.

Thank you!

probably-england commented 5 years ago

This is an enhancement request. There is no error report or log. Is there a way to tag this as an enhancement request instead of a bug report?

retorquere commented 5 years ago

You mean merged as in merged in Zotero? Because you can already specify an ids field by just adding bibtex[ids=key1,key2,key3,...] in the extra field.

probably-england commented 5 years ago

I have to merge approximately 2,000 duplicate pairs with potentially different citation keys from several BibTeX files. For each of the 2,000 pairs, my process for merging them manually would be this:

  1. Remember the title of the first publication displayed on the merge tab. (It is not possible to copy from the merge tab.)
  2. Click on the library tab.
  3. Select the search box.
  4. Type the publication name I remembered into the search box.
  5. Click on each paper with the same name to check whether the citation keys are different. If they are all the same, skip to step 14.
  6. Click on the publication I am merging to and type "bibtex[ids=]" to the notes field.
  7. Click on each other publication with the same name in the search results.
  8. Select the citation key with my mouse.
  9. Copy the citation key.
  10. Click back to the original publication.
  11. Click the "extra" field.
  12. Paste the citation key into the into the extra field.
  13. Click back to the merge tab.
  14. Click merge.

I am slow at this, and this process takes me 30 seconds per duplicate pair. That times 2,000 is 60,000 seconds or about 16 hours. It also requires me to maintain concentration to remember the paper title when search and the citation key when checking if they are the same.

However, I am not picky about how the rest of the fields get merged. If the citation keys could be merged by just clicking "merge," doing that 2000 times takes maybe 10 minutes and requires no concentration.

I would be okay with writing a script to do the merge without using Zotero, but I do not know of any libraries that offer high quality and easily accessible duplicate detection heuristics. All the ones I know of mainly detect exact matches between fields. Do you know of any libraries or other software that might make this process easier? I would also be interested if there is a way to get Zotero to export a list of duplicates it has found within a collection with metadata saying which entries are duplicates. While I can export the full contents of the duplicates tab, this does not say which pairs of items should be merged.

Longer term, the kind of functionality I am looking for is anything that would make it easier to merge in bib/aux files that have large overlap with the existing Zotero library and do incremental updates as coauthors change them. This kind of functionality would make it much easier to use Zotero on multi-author projects with other authors who do not use it.

An example of a workflow for this could be something like the following.

  1. Import a bib/aux file from my coauthors into my Zotero library.
  2. Merge the duplicates in the Zotero or BBT UI.
  3. BBT automatically generates the ids field when I do the merge.
  4. BBT remembers what I previously imported so that if I ever try to import an exact duplicate of a previously imported publication, it gets excluded from the import.
retorquere commented 5 years ago

BBT doesn't have an UI and is unlikely to grow one -- I really hate doing UI work. Everything UI you see is me hijacking existing Zotero UI.

I'm OK with thinking about a way to generate an ids= field from a merge, but it's not easy.

Zotero doesn't inform plugins that a merge has happened; technically, one item gets updated (the merge target) and one is deleted (the merge "victim"), and all I see is a modification and a deletion, as separate events, and while there is currently an order in which I receive them, this order is not guaranteed.

This is complicated by the fact that I delete the citation key for the "victim" item of the merge from my internal database, and I can't know up front that I'm supposed to hold on to this key for a little bit to see if a merge has happened, which as far as I can tell right now is registered after I get the deletion event, and I don't get any notification of this.

Detection and timing is going to be tricky on this thing. But I'll see what I can do.

Point 4 cannot be done realistically. Zotero importers can't look into the DB so I cannot assess whether an item already exists -- all they can do is take a stream of text, turn those into Zotero objects, and offer them to Zotero -- Zotero does the actual saving, and importers are not involved. I can't even see what file is being imported as importers only get handed a text stream to read from so I can't even say "I've seen this file name before".

blip-bloop commented 5 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5.1.101.4296 ("fixed #1221")

Install in Zotero by downloading test build 5.1.101.4296, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 5 years ago

Give 4296 a go. I've foregone the formal notification plumbing of Zotero and just attach to the actual action of merging (aka monkey-patching) -- it seems to work in my tests. It adds the aliases in the extra field with Citation Key Alias: in front -- you can edit this as you like, what's in the extra field is what is actually used on export. Mind that adding a bibtex[ids=...] will override, not add to this.

retorquere commented 5 years ago

(monkey-patching is a little more fragile because Zotero is not required to keep the internals the same -- if they change, this breaks and I have to fix it)

retorquere commented 5 years ago

This is an enhancement request. There is no error report or log. Is there a way to tag this as an enhancement request instead of a bug report?

I'd usually prefer a debug log nonetheless because it gives me a copy of your references and it makes it easier to reason about the output you want to see; I also use these to add test cases to my test suite. I've constructed a testcase myself in this instance, but testcases are really important because they prevent regressions.

blip-bloop commented 5 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5.1.101.4297 ("testcase for #1221")

Install in Zotero by downloading test build 5.1.101.4297, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 5 years ago

I really need to know of this does what you want

retorquere commented 5 years ago

hello?

blip-bloop commented 5 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5.1.110.4343 ("Merge branch 'master' into gh-1221")

Install in Zotero by downloading test build 5.1.110.4343, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

probably-england commented 5 years ago

Thank you very much for implementing this. It will be a few days and to weeks until I do the merge and can verify the new functionality. I did not think you would be this fast with adding something :-)

retorquere commented 5 years ago

Can you do a test anyhow to see if it appears to do what you want? I prefer not to keep issues open for too long.

blip-bloop commented 5 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5.1.110.4352 ("Merge branch 'master' into gh-1221")

Install in Zotero by downloading test build 5.1.110.4352, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

blip-bloop commented 5 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5.1.125.4418 ("Merge branch 'master' into gh-1221")

Install in Zotero by downloading test build 5.1.125.4418, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

github-actions[bot] commented 3 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.