retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.19k stars 283 forks source link

Rewrite for Zotero 5 #555

Closed ericnchen closed 6 years ago

ericnchen commented 8 years ago

I have installed the latest 5.0 beta for Zotero on a new computer and installed the latest version of Better Bib(La)Tex as well. The versions specifically are 5.0-beta.r39+bdec4b1 and 1.6.72, respectively.

Anyway, after installing Better Bib(La)Tex and restarting Firefox (45.3.0 ESR on CentOS 7) I received a JSON error though I didn't think to save the error. I went to Advance Settings and enabled Debug Mode for Better Bib(La)Tex and restarted Firefox. Now on restart, I get the following message:

 Better BibTeX has been disabled because it found Zotero undefined, but requires 4.0.28 or later.

I also can not edit any of the BBT options anymore because it also tells me that BBT has been disabled. The only thing I can do I presume is to just uninstall the extension from Firefox and reinstalling it and hoping it works.

Before I turned on debug mode BBT showed up in the Zotero options, at least. I was not able to export anything with BBT though. Selecting "export library" didn't provide BBT as one of my export options.

Is there a working version of BBT that will work with the 5.0 beta?

retorquere commented 8 years ago

There is currently not a version that works with 5.0, and I'm not certain whether there will be a compatible BBT at the release date. Zotero 5.0 is not just an incremental update to Zotero, it is really a wholly new program, and almost all non-trivial extensions will have to make extensive changes for 5.0.

Rewriting BBT to be compatible is going to be a substantial effort, partly because it moves far and wide beyond what Zotero officially allows extensions to do, but mostly because the database paradigm has changed, and BBT has some deep-rooted assumptions in that domain.

I currently lack the time to go heads-down on this and spend the time it would take to do this. I don't know exactly when I will be able to make the time, and there is also the matter that 5.0 is a transitional Zotero in any case -- there are already plans to move away from Firefox entirely to another platform (Electron), which would likely mean another substantial rewrite. Between my lack of time and the dynamic of the developments of Zotero, I'm going to wait until their plans solidify before I move (unless I suddenly get a month of nothing-to-do spare time, which is unlikely).

This is not a complaint about how Zotero moves BTW. I am unhappy with these moves because they impinge on my already cramped planning, but I understand why they are making these moves.

ericnchen commented 8 years ago

Thanks for the response!

retorquere commented 7 years ago

After a few attempts it looks like 5.0 is going to require essentially a full rewrite. With some 40k lines of code, this is going to take a while -- end of october at the very earliest, and very likely later than that.

dbobak commented 7 years ago

That is a very bad news. My entire workflow depends on stable BibTeX keys. Do you know any other possibility to have them in Zotero 5.0?

retorquere commented 7 years ago

None that I know of that don't require coding. I'm working on 5.0 compatibility, but it will require major changes to BBT. I don't have an ETA for this.

retorquere commented 7 years ago

(I'm juggling a full-time job, studies, a family and the work on BBT. I was doing OK with incremental updates to BBT, but the 5.0 port is not an incremental change. Basically everything is broken right now, and I'll need to fix/change everything before I see even parts of BBT work again, which makes it incredibly hard to judge how long this is going to take)

dbobak commented 7 years ago

Great, that you are working on it anyway. I will stick to Zotero 4.x as long as possible.

steko commented 7 years ago

@retorquere thank you for your continued efforts. As a devoted user, may I ask if funding would help you work on fixing BBT for Zotero 5.0?

retorquere commented 7 years ago

Truly appreciate the offer, but it wouldn't help. What I need is time, and my calendar has just flooded the last month. I hope to make progress during the holidays.

adam3smith commented 7 years ago

Hi @retorquere -- just to be able to give better feedback&support to potential users, do you have an update on this?

retorquere commented 7 years ago

I can't offer anything of substance at this point. I've started a full rewrite, but it's incredibly slow going.

Frank-Zappa commented 7 years ago

Was there any discussion to merge zotero-better-bibtex functions to Zotero 5.0 directly? I think, we should urge the Zotero dev team to consider it.

adam3smith commented 7 years ago

Some features of BBT make a lot of sense as an add-on. E.g. Zotero wouldn't want to expose anywhere close to as many preferences for bibtex as BBT does. Some things will eventually happen but need more time -- stable editable citekeys most importantly.

Some things would be great but are probably just not high enough on the core devs agenda to realistically happen any time soon. Auto export is probably among those.

retorquere commented 7 years ago

Yeah, as long as space for cite keys isn't even present in the references, I wouldn't hold my breath for bbt integration.

BBT does have a baroque number of preferences, but all of them have sensible defaults. There's no need to have them exposed in Zotero - that could be the job of an extension. Just saying.

And there are things that really should have been separate extensions - stuff like auto-export really doesn't even belong inside bbt, but it's mostly there because it needed a caching system (zotero reference serialization is a really big bottleneck), and bbt had one, and I only need auto-export for bbt. Plus the serialization cache as it is now does some minor damage to the serialized objects by simplifying them to just data, no methods, so it's not safe for any and all translators.

I'd say that the serialization cache would be a real boon for zotero users that do frequent exports, and that a damage-free cache is possible inside zotero, but it's quite possible that only bbt users are doing frequent exports - other users just use zotero directly I'd venture to guess.

ghost commented 7 years ago

Are there any updates on this since Zotero 5 is out?

retorquere commented 7 years ago

Only that I've not given up on the idea. BBT requires a full rewrite for 5.0, BBT is a pretty complex beast, and the simple ports is tried have all failed so far.

retorquere commented 7 years ago

First thing I'm going to do is get my test framework back up. Without my tests I can't do anything. Next order of business will be adding the translators, but stuff like auto-export won't be there in the beginning.

MarioJose commented 7 years ago

@dbobak, for while, you can select all your references in Zotero, left click with mouse and export it to BibLaTeX. Zotero have a option to export to many formats. But, when you add a new reference, you have to export all your references again.

retorquere commented 7 years ago

I may spin off the auto export to a separate plugin if feasible. But first things first :

  1. Cite key generation
  2. Basic export (but better, natch)
  3. Everything else.

On Jul 14, 2017 7:04 PM, "Mario José Marques-Azevedo" < notifications@github.com> wrote:

@dbobak https://github.com/dbobak, for while, you can select all your references in Zotero, left click with mouse and export it to BibLaTeX. Zotero have a option to export to many formats. But, when you add a new reference, you have to export all your references again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/retorquere/zotero-better-bibtex/issues/555#issuecomment-315412481, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIEDHRlGZ9v9IW8mzw2ceelfiTIBCxLks5sN5-ogaJpZM4JtLTL .

dbobak commented 7 years ago

@MarioJose, I know that. But it does not solve the problem of stable keys. For me it's the most important feature; I could manually export up-to-date after update, but I have no guarantee that the keys will not change.

retorquere commented 7 years ago

Stable keys is the first order of business. I have a skeleton implemented that loads that I'm starting to populate slowly.

dbobak commented 7 years ago

@retorquere, marvelous news. Thank you very much.

retorquere commented 7 years ago

Stable keys would be tons easier to achieve btw if the zotero crew would include a means for extensions to store their own data. It's been discussed for years at this point.

MarioJose commented 7 years ago

Stable keys is a important issue. I didn't think that @dbobak. I thought that had a standard for that. Thank you @retorquere!

retorquere commented 7 years ago

Stable keys are challenging in Zotero because the part of the code that generates bibtex is isolated in a way that it can't look at the database to spot duplicates. BBT works around that in a way that zotero really shouldn't and I rather wouldn't.

retorquere commented 7 years ago

For the curious, activity on this happens in https://github.com/retorquere/zotero-better-bibtex/tree/z5. Nothing usable is there right now, but if you're the adventurous kind, please let me know if your willing to test once I get something running.

If you do:

  1. Thank you so much, and
  2. BACKUP, BACKUP, BACKUP

I have an extensive test suite that I will have running before I would even think of submitting it for test, but BBT 4 Z5 will most likely incorporate some visual changes in the library that I'd like to have checked off. This is your chance to influence those changes. Getting the test suite to pass is likely going to take at least two weeks though (wouldn't you know it, the office and the family and the study also demand some of my time. Crazy stuff).

jhrmnn commented 7 years ago

Signing up as an alpha tester (dev@hermann.in). Have no experience with FF extensions so cannot offer help coding.

retorquere commented 7 years ago

No coding experience required, I just need feedback. I don't expect to release alpha-level stuff in any case -- whatever I release will pass at least a significant chunk of my existing tests. It's mostly that I have some plans to change where the cite keys live, technically, but these will be visible in the UI. My first testing need is that I need to know whether that visual change is acceptable.

dbobak commented 7 years ago

I would also help as an alpha tester ( deni+github(at)lithics.eu ). I am not a programmer in any way, but I could track and report bugs.

hquilo commented 7 years ago

I might be able (and would like) to help coding and refactoring ... check my info at camilorocha (dot) info and please send me an email if you think I can be of any help.

retorquere commented 7 years ago

@hquilo I've added rudimentary dev setup instructions in CONTRIBUTING.md, please check if they're clear enough to get you going.

dbobak commented 7 years ago

@retorquere It's is quite complicated to install all the required software on Windows. And I'm on Windows only. Would it be possible to publish xpi?

retorquere commented 7 years ago

@dbobak windows is a miserable experience for BBT dev. It should be better now bash on Ubuntu on windows is available but I haven't tried myself.

I'm nowhere near a sensible xpi yet but once I do they'll be published as usual under the "builds" release on github. But for that to happen I need to have my test & deploy scripts ready, and have at least one passing real test. None of those milestones have been hit yet, although the test & deploy scripts are making good progress.

hquilo commented 7 years ago

@retorquere Thanks for the instructions. I managed to advance almost to the end. I got stuck at this instruction

Run ./features/support/mkprofile. Zotero will start up, import a bunch of references, and shut down.

Where is this folder structure located?

retorquere commented 7 years ago

@hquilo I forgot to add instruction to checkout the z5 branch (which is where that script lives). See new updated CONTRIBUTING.md. You will have to run the bundle update and npm i steps again after changing to the z5 branch.

retorquere commented 7 years ago

But if you wait a few hours I'll rename the z5 branch to master; things may get confusing if you overlap. AAMOF I'd recommend getting a new checkout after I rename the branch.

retorquere commented 7 years ago

OK the branches have been renamed; I recommend getting fully new clone of the repo.

retorquere commented 7 years ago

OK, feedback time. Warning: large wall of text ahead, which contains a fair bit of technical, possibly Zotero or BBT specific mumbo-jumbo. The TL;DR is at the bottom.

As I'm rebuilding BBT for Z5, I'm trying to minimise, preferably eliminate, the monkey-patches and sandbox-piercing that BBT employed to do its work, a.o because many of the monkey patches assumed Zotero would work synchronously which makes wrapping or replacing code relatively easy. Z5 is all gung-ho for async code, which is both understandable and a huge pain in the ass (and the reason that I'm talking about rewriting rather than porting BBT). Right now I'm at a crossroads with regards the generation of citation keys. My desiderata for this are:

  1. When generating citation keys, I need to be able to search the whole library the reference lives in, not just the subset being exported, for potential duplicates. Given the translator sandboxing, this means citation keys must be generated outside the sandbox.
  2. This citation key must be attached to the reference so it does not change unless explicitly triggered by the user
  3. There are people with large libraries, so the search for duplicates needs to be efficient
  4. The citation keys must be available within the translator, so the citation key generated outside the sandbox must find a way in.

What I did before was:

  1. Pierce the sandbox so key generation can be initiated from within the translator.
  2. This key generation searches a secondary cache that holds the citation keys, and places the new key there, because searching the "extra" field is way too slow.
  3. If the key is meant to be pinned (that means doesn't change when e.g. the reference title changes), I also store it in the "extra" field so it will sync
  4. Return the generated key to the translator and it's off to the races.

This however has a number of drawbacks that I'd love to get rid of while I'm rebuilding BBT.

  1. As from the Phil Karlton quote "There are only two hard things in Computer Science: cache invalidation and naming things.": I'd dearly love to get away from my secondary cache. I'm seeing cases in the wild where it gets out of sync to the actual library which requires not just a full table scan but a parse of each result to find the pinned keys in the "extra" field, which gets me to
  2. There is no way to efficiently search for duplicates in the "extra" field, as it requires inspecting each of the extra fields to lift the key out.
  3. I don't want to pierce the translator sandbox anymore. The sandbox has some very decent defences in place that are tricky to work around. It's extra work for no benefit, and potentially risky.

The "clean" option from the Zotero pov of course is to store all citation keys in the "extra" field, pinned or not, and retain the secondary cache for searches. I don't like this because cache invalidation remains an issue. The citekeys would get to the translators without sandbox-busting, which is better than it was. I know this is always picked as the preferred option because it doesn't interfere with the regular operations of Zotero, but it sure does interfere with mine because of the efficiency and/or cache invalidation problems.

I've been experimenting with several alternate approaches, including:

Hacky alternatives that violate the spirit of the Zotero API but won't break Zotero, and which would allow for efficient finding of duplicates:

  1. Store the citation keys in a tag. This will make the tag selector instantly useless because it will be flooded with citation key "tags", and I seem to recall there were performance problems when too many tags were present in a Zotero library. Perhaps the performance problem has been solved; I've poked around a bit in Zotero to see if I can just have the tags not show up in the tag picker, but haven't gotten far yet. Such hiding would involve monkey-patching.
  2. Store the citation keys in a specially formatted linked URL (e.g. url="zotero://better-bibtex/citekey", title="@citekey"). Downside is that every reference will show that it has an attachment when potentially that can be a sole "attachment" just holding the citekey. I've likewise looked into hiding those in the UI but haven't gotten far on that either. The hiding again would involve monkey-patching. "zotero://" urls cannot be imported back I've just found, so I may have to settle on something like "https://better-bibtex/citekey"

Hacky alternatives that will (likely) break Zotero, so they're not options for BBT:

  1. Store the citation keys in spurious relation records. When I previously tried this some years ago this screwed up my library beyond any salvation (btw, the account "emilianoheyns" that linked to it can be killed AFAIC).
  2. Store the citation keys in the existing tables for custom keys. Will likely not sync, and may or may not break Zotero.

Clean option

I very much would love to see a non-hacky alternative that would just allow me to store the citation key associated to the reference, efficiently searchable, syncable, and available to the translators. Support for custom fields in Zotero has been talked about many times before; I'm hopeful now that the major work on Z5 is done, there would be time to implement custom fields, but this is a request that has in the past only gotten the conceptual "good idea, some time in the future" half-thumbs-up, so it is only a weak hope.

Finally then, my question:

If support for custom fields is not likely to emerge in the foreseeable future, I'm leaning towards the linked URL alternative at the moment. Thoughts?

TL;DR

I want to store the citation keys in a Zotero-native way. Zotero doesn't offer a formal Zotero-native way, so I'm considering stuffing them in attachments. These attachments will show up in your library. Is this OK?

anne-urai commented 7 years ago

I would have no problem with attachments containing citation keys, since I already have many items that contain both a pdf attachment and e.g. a Pubmed entry link, and I don't search for PDFs from within Zotero (Zotfile renames them and Mac OS Spotlight finds the files on disk). Thanks for your work on this!

duncdrum commented 7 years ago
retorquere commented 7 years ago

@duncdrum I wouldn't hold my breath for Juris-M being superseded by Zotero; it does a lot more than just adding a few fields (like multi-lang support, which is not a trivial change against Zotero).

The attachments would have edge cases of their own; if you merge two references they resulting reference would have multiple citation keys. I don't yet know by which algorithm I'm going to resolve that; but then in the current situation the resulting reference would get a random pick from the merged references, so it's not really all that different. I've posted the large wall of text on zotero-dev too in hopes that the Zotero devs chime in with better ideas. But good points on the tags. So the tags option is out.

The tags would have had as side benefit though that they're searchable; citekey search in Zotero came by way of a rather unpleasant monkey-patch.

duncdrum commented 7 years ago

@retorquere I see your point about the searchability of tags. In my own use cases the pros of searchable citekeys, wouldn't outweigh the cons. But I guess I'm in the minority.

As for Juris-m, I still call it MLZ and refuse to give up hope 😎

retorquere commented 7 years ago

@duncdrum I think your tags argument is persuasive, so for the time being I'm sticking with either the extra field or with the attachment. Searchability of tags can be done in other ways, or possibly deferred until Zotero finally gets a dedicated field for citation keys.

There must be more to Juris-M (or MLZ), because I feel confident that if MLZ could be merged easily with Zotero, Frank wouldn't mind at all getting that chunck of his life back.

hquilo commented 7 years ago

@retorquere I figured that much last night while following the instructions. I did checkout the z5 branch and worked from there. Once I get back home (tonight) I will try to get rid of the problem.

As for the way of storing a citation key, I lean towards having them as an attachment. If Zotero finally includes a citation key field in future releases, then upgrading to this more stable solution from the temporary attachment one seems OK.

retorquere commented 7 years ago

@hquilo I'm actively testing that out now but it seems there's something in the Zotero notifier that's broken that I need (the alternative would be to do a moderately expensive DB call), hoping they'll get back to me soon.

retorquere commented 7 years ago

@hquilo I would suggest doing a clean clone because I don't think the rename will have made it with full history in the repo. I actually have very little on my hands this afternoon so at least for a few more hours I can work towards getting the test harness up and a rudimentary key generator.

retorquere commented 7 years ago

One thing where everyone could help is go here on Trello and add comments for features or functionality not already listed on the board. BBT is a complex beast, and as I'm not porting but rebuilding from scratch I must make sure I re-create everything. The test cases I have cover a lot, but potentially not everything, and the don't do any UI.

troeger commented 7 years ago

I wrote a small Python script for converting a Zotero5 BibLatex export to stable citation keys:

https://gist.github.com/troeger/87848e8485c8f009537a6e085ce16a15

This is only an intermediate emergency solution, in case you need publication work to be done. Some entry types simply don't get their notes exported with the standard mechanisms.

andersjohansson commented 7 years ago

If the question is just to have a working solution until citekeys are added to Zotero (hopefully in a few months time, after 4 syncing is turned off) wouldn’t it be best to choose the simplest solution that would be simplest to convert after that (and for the time being accept that it might be a little slow)? I don’t know if using the Extra field or the linked urls would be simplest though.

Would an url be added for every item (used as cache?) or only for those with pinned citekeys? For my use-case I only have a few pinned citekeys and go with the generated (although with a customized format) most of the time. Having (visible) urls added to all vs the few items with pinned keys would be quite different experiences then, with the later not distracting at all and the former a little distracting.

retorquere commented 7 years ago

If indeed the citekeys are coming in a few months, and there are now indications that they might, then yes, storing them in the extra field is easier at the cost of performance. Given that the odds have just increased of this happening, I'm re-considering keeping the cite keys in the extra field.

Having the keys in an attachment would not have been a cache, it would simply be the place where the citekeys would be stored. All of them, not only the pinned ones. I want to move away from a secondary database to keep the non-pinned keys, and I do need to keep them somewhere to have stable citekeys. With the citekeys-in-attachments, all references would automatically get at least one attachment, being the citekey, and within that attachment, and indication of whether it's pinned or not. I don't really like that either, but the secondary database takes upkeep to make sure it doesn't go out of sync with the primary database, expiring entries when they go out of the primary database, etc.