zotero / zoterobib

ZoteroBib is a free online bibliography generator from the makers of Zotero
https://zbib.org
Other
70 stars 19 forks source link

Slow copy citation dialog #213

Closed flachware closed 6 years ago

flachware commented 6 years ago

@tnajdek the copy citation dialog is extremely slow for me today, it takes about ten seconds and more to show up.

tnajdek commented 6 years ago

hmm... It might be somewhat slow but that's extreme. How many entries in total did you have? Which citation style? Did it resolve itself or is it reproducible?

flachware commented 6 years ago

I have about 60 entries, it happens for all citation styles, and yes, I can still reproduce it. I think the behaviour is tied to the number of entries, if I have just a handful of entries it’s fast again.

tnajdek commented 6 years ago

I actually don't know how to make it any faster. In c63faba7f10f38b3ade019d589a382f1fe28eb58 I've mentioned that this is relatively slow:

Citeproc citations registry is built using appendCitationCluster which in turn calls processCitationCluster which makes it computationally expensive. This needs to be rebuilt every time bibliography is in any way changed.

Instead of doing this expensive processing when bibliography changes, it's done whenever the citation dialog is opened, that's why it gets slow with large number of citations.

It doesn't seem to be possible to build registry once and maintain it, because Citeproc doesn't offer removeCitationCluster or updateCitationCluster. Thus in order to make sure it's up-to-date it needs to be rebuilt every time.

Any thoughts on how to speed this up much appreciated.

flachware commented 6 years ago

I certainly can’t help how to speed that up, but thinking of the UI: would it be possible to handle that similarly to launching the editor, where we show the backdrop with a spinner while the application is busy?

tnajdek commented 6 years ago

Actually, since the only reason we're doing this is disambiguation, we could make it faster by checking if all entries have unique authors (is there any other disambiguation factor?). We would then only build full registry if that's the case so it would still be fast 95% of the time and for those odd cases where you have two works by the same author, it would be slow but perhaps we could include spinner to make it more bearable?

dstillman commented 6 years ago

It's probably worth reviewing with @adomasven how you're doing this, since he knows the most how this all works. (I know very little about it, but this code, e.g., looks suspicious to me.)

adomasven commented 6 years ago

@tnajdek see https://github.com/adomasven/zotero/blob/23224f60935a3348c7aae3d490ada8391e152de7/chrome/content/zotero/xpcom/integration.js#L1629-L1674

It doesn't seem to be possible to build registry once and maintain it, because Citeproc doesn't offer removeCitationCluster or updateCitationCluster. Thus in order to make sure it's up-to-date it needs to be rebuilt every time.

You do not remove citations/items from the processor. With processCitationCluster() calls you pass in all the previous and following citations to the one being processed and citeproc will return the appropriate text for the current citation and all other citations where text may have changed as a result of altered pre/post lists. You probably know this, but citeproc registers item ids (that you use in pre and post arrays) in 2 ways:

  1. From the first argument of processCitationCluster() or appendCitationCluster()
  2. By first calling updateItems(ids) https://github.com/Juris-M/citeproc-js/blob/master/manual/citeproc-doc.rst#updateitems

I believe calling previewCitationCluster() is not very efficient because it rebuilds the citation processor state for that one preview internally. Since it seems that you are ensuring that the processor state "is warm" before getting the citation text, you can call processCitationCluster() there instead of previewCitationCluster() (you will also need to call setOutputFormat() for the different formats).

Do note that you will need to manually trigger the reprocessing of items that receive metadata changes, but otherwise you shouldn't be rebuilding the processor state for every bibl or individual citation generation.

tnajdek commented 6 years ago

@adomasven thank you for the explanation. Unfortunately I wasn't able to apply this to bib-web scenario.

It seems in your case you have a list of of changes and you're able to call processCitationCluster on every single change. I believe it would be a substantial rewrite for bib-web to keep track of changes in this way. I'm also not sure how to remove citation in this case, I believe skipping it in pre/post would do, but citeproc would still keep track, isn't that right? Wouldn't that cause a memory-leak?

Anyway I've ended up using restoreProcessorState call to avoid repetitive calls to processCitationCluster call which seems to build citeproc's internal registry nicely. It still takes ~1 sec per 100 entries but I believe that's acceptable?

tnajdek commented 6 years ago

Closing, please re-open in case of any issues!