wshanks / lyz

LyZ is a plugin for Zotero, which is intended to make working with LyX/Zotero more pleasant.
GNU General Public License v3.0
108 stars 13 forks source link

Add key format "zoteroExport", for compatibility of Zotero bibtex export. #28

Closed bewantbe closed 5 years ago

bewantbe commented 5 years ago

For solving https://github.com/willsALMANJ/lyz/issues/9.

Directly use the key from Zotero bibtex translator, so maximize the compatibility.

wshanks commented 5 years ago

Thanks for submitting this. It has been a while since I have thought about the inner working of LyZ, so it is taking me a while to find time to review it. I think the somewhat convoluted system of generating keys was developed to handle conflicts between similar items. Do you know if this method using the Zotero key is stable in such cases?

bewantbe commented 5 years ago

The Zotero BibTex exporter is not absolute stable. There are few minor changes, which can be seen in the history, such as remove markup from titleword(s) in citekey and avoid blank parts in BibTeX key.

Also the Zotero Better BibTeX document point out that in some rare cases, the default exporter might not generate deterministic key.

But I think, our goal here is to match whatever Zotero will export to the bibtex file.

For the conflict handling, my experience of using this patch is that, conflict will occur the first time you insert an existing (in the bibtex file) item through LyZ. The duplicated item will be inserted to the end of bibtex file, and you will get an error once compiled the tex file. Then you will need to manually remove the item in the bibtex file, after that everything normal again. At this point, LyZ remembered the item in its sqlite record, and no more duplication happen for the same item.

To fully solve the conflict issue, probably the keys in lyz.sqlite need to be synchronize with the bibtex file whenever necessary.... Maybe reuse the function updateFromBibtexFile? I'm not sure about this part.

bewantbe commented 5 years ago

Checked the LyZ code more thoroughly.

For the zotero bibtex exporter generated keys, duplication will not happen within one export, but could happen if you export it one by one. Collision handle code here.

For LyZ, the identity used to distinguish duplication is the zid from Zotero.Items.getLibraryKeyHash. In this patch, the zid works as usual.

But, for exported bibtex file, there is no such zid information which LyZ is expecting it in the first line of the bibtex file. So the updateFromBibtexFile function can not fix LyZ database in this case.

If the reverse lookup (match a bibtex item to a Zotero item) is really needed. This might be useful: [setup a zotero search].

I think there will be no perfect solution. People can alway add context to the bibtex file manually, and it is beyond control of both LyZ and Zotero.

So if the user intend to link a .bib file not fully managed by LyZ, i.e. not append item only through LyZ from the begining, Then it is the user's responsibility to maintain consistency. In this case, the duplicated item.

wshanks commented 5 years ago

Thanks for all the information. I actually had not been thinking about the use case of citing items with LyZ that are already in the .bib file, but that is an interesting case to consider.

I had been thinking about the case of two similar items having a collision in their cite key (for example when you cite two articles from the same authro and same year with similar titles). The other cite key formats for LyZ have protection against generating duplicate keys: zotero and zoteroShort base the key on the Zotero ID which is unique and the custom keys created with the author, year, and title keywords get checked against the database and de-duplicated by appending a number to the end.

Your implementation does not check for such key collisions because oldkey is obtained by running one item through the export translator and pulling out the key by parsing the BibTeX output. When Zotero generates a BibTeX file for a set of items, it checks for such collisions and adds a -2, -3, etc to the end of colliding cite keys.

To account for the -2, LyZ would need to process all the cite keys together instead of generating them individually. I am not sure if it could reproduce the translator's behavior (to iterate through the items in the same order to tell which gets the -2 suffix). The alternative would be to generate the full bibliography, pull out all of the cite keys and citation text for each item, and then use the citation text to distinguish any items that appear to have a citation key collision.

While looking into this, I realized that LyZ already has support for using the Zotero BibTeX translator's keys. To use them, you need to uncheck the "Create cite key?" box in LyZ's settings. This can be seen here where key is extracted using the same regular expression as in createCiteKey. Just like your patch, generating cite keys in this way does not protect against collision because each item is processed separately.

So I don't think we need to merge in this PR, but perhaps we need to improve the documentation for the "Create cite key?" checkbox. At some point it would be nice to address the key collision issue. Let me know what you think.

bewantbe commented 5 years ago

Oops..., I didn't realize that the ckre (regular expression) for "Create cite key?" and oldkey are exactly the same. Nice catch.

But still, this patch let you work first work on an existing .bib file (with the "Create cite key?" off), then switch to work with LyZ (through the new key format 'zoteroExport', and "Create cite key?" on).

I had this use case: A big document used to work with Zotero exported .bib file. At some point, I tired to the loop: (1)add new citation to library (to zotero), (2)do export in zotero, (3)search and cite the citation in LyX. So instead, switch to LyZ, thus simplifies the step (2) and (3). Now I would need (a) able to cite existing key, (b) able to add new key because I don't want to do (2) again (here comes this patch).

This patch is nowhere perfect, in the scene of (I) not able to automatically distinguish usage (a) and (b), and (II) as you point out, not able to distinguish similar item (such as "big theory part-I" and "big theory part-II" for the same author year). However in most of the time, it ease the use case given above.

bewantbe commented 5 years ago

Oh, no, ignore what I said above. You are right, with "Create cite key?" off, it accomplish what I said above exactly (the (a) and (b)). So this patch is not necessary (I'm OK to close once the details are confirmed). The document and probably the GUI tooltip on this really need a refinement.

For the key collision issue, it need another good day to think of.