Closed mclearc closed 7 years ago
This can definitely be considered a bug. My bib has ~1200 entries and parsing takes a fraction of a second. Perhaps it's related to finding PDFs. Could you please post a representative entry from your bib file?
Sure, here are three fairly representative entries. Helm-bibtex does eventually parse the file, and after the parsing things are fine until I have to restart emacs.
@article{fricker2016,
title = {What's the {{Point}} of {{Blame}}? {{A Paradigm Based Explanation}}: {{What}}'s the {{Point}} of {{Blame}}},
volume = {50},
timestamp = {2016-10-14T18:44:40Z},
number = {1},
journaltitle = {Noûs},
author = {Fricker, Miranda},
date = {2016},
pages = {165--183},
file = {fricker2016_what's_the_point_of_blame.pdf:/MasterLib/fricker2016_what's_the_point_of_blame.pdf:application/pdf}
}
@article{valaris2016,
title = {What {{Reasoning Might Be}}},
abstract = {The philosophical literature on reasoning is dominated by the assumption that reasoning is essentially a matter of following rules. This paper challenges this view, by arguing that the rule-following model of reasoning, by arguing that it misrepresents the nature of reasoning as a personal-level activity. Reasoning must reflect the reasoner’s take on her evidence. The rule-following model seems ill-suited to accommodate this fact. Accordingly, this paper suggests replacing the rule-following model with a different, semantic approach to reasoning.},
timestamp = {2016-11-02T03:58:19Z},
author = {Valaris, Markos},
date = {2016},
keywords = {reason,reasoning},
file = {valaris2016_what_reasoning_might_be.pdf:/MasterLib/valaris2016_what_reasoning_might_be.pdf:application/pdf}
}
@article{weinberg2016,
title = {What Is the a Priori, That Thou Art Mindful of It?: {{A}} Comment on {{Albert Casullo}}, {{Essays}} on a Priori Justification and Knowledge},
volume = {173},
timestamp = {2016-10-14T18:44:40Z},
number = {6},
journaltitle = {Philos Stud},
author = {Weinberg, Jonathan M.},
date = {2016},
pages = {1695--1703},
keywords = {a priori,empiricism,epistemology},
file = {weinberg2016_what_is_the_a_priori,_that_thou_art_mindful_of_it.pdf:/MasterLib/weinberg2016_what_is_the_a_priori,_that_thou_art_mindful_of_it.pdf:application/pdf}
}
Could you please set bibtex-completion-pdf-field
to nil
and test again?
Thanks for the suggesting. Looks like that speeds things up. It still takes about 30 seconds the first time I load helm-bibtex, but it is definitely faster than with the pdf-completion-field set
Interesting, I expected no effect or that parsing would be around 1 or 2 seconds. Would you mind giving me access to your complete bib file? My email address is: malsburg@uni-potsdam.de
Thanks. I sent you the bib file. I should also say, in case it matters, that I use use-package
to lazy load helm-bibtex. But I would't think that would affect load time to the degree that it has.
Ok, I can reproduce this problem. But I get extremely variable load times ranging from 8s to 110s and I couldn't pin down what's causing this variability. One thing that I noticed is that loading is really fast (1.3s) when I use the following settings:
(setq helm-bibtex-bibliography "/tmp/test.bib")
(setq helm-bibtex-notes-path nil)
(setq helm-bibtex-library-path nil)
(setq helm-bibtex-pdf-field nil)
Could you please show me how you set these variables?
I think the culprit is the code that searches for notes. Do you have one notes file? And if yes, how large is it?
The other issue likely is that you're referencing PDFs via the file field which is know to be slow. I store all PDFs in one directory and name them <bibtex-key>.pdf
which makes it much easier to find them. I would recommend that you change to that way of linking PDFs but it appears that you're also using JabRef and that doesn't understand this.
Here's what I have under :config
in my use-package
set-up:
(setq bibtex-completion-bibliography "/Users/roambot/Dropbox/Work/Master.bib"
bibtex-completion-library-path "/Users/roambot/Dropbox/Work/MasterLib/"
bibtex-completion-pdf-field nil
bibtex-completion-notes-path "/Users/Roambot/projects/notebook/content/org_notes"
bibtex-completion-additional-search-fields '(keywords)
bibtex-completion-notes-extension ".org"
helm-bibtex-full-frame nil)
;; Set insert citekey with markdown citekeys for org-mode
(setq bibtex-completion-format-citation-functions
'((org-mode . bibtex-completion-format-citation-pandoc-citeproc)
(latex-mode . bibtex-completion-format-citation-cite)
(markdown-mode . bibtex-completion-format-citation-pandoc-citeproc)
(default . bibtex-completion-format-citation-default)))
;; Set default action for helm-bibtex as inserting citation
(helm-delete-action-from-source "Insert citation" helm-source-bibtex)
(helm-add-action-to-source "Insert citation" 'helm-bibtex-insert-citation helm-source-bibtex 0)
(setq bibtex-completion-pdf-symbol "⌘")
(setq bibtex-completion-notes-symbol "✎")
)
I don't use a single notes files, but rather one note file per entry, which are all saved in one directory, "org_notes". I also store all my PDFs in one directory. I don't use <bibtex-key>.pdf
but rather <bibtext-key><title>.pdf
. But I would be surprised if this generates significant time lag. I have noticed a significant decrease in opening time, however, now that I have set the PDF file completion path to nil.
Hi, I've just found this open issue, and I'd like to point out that I'm having the same problem, and that I am also using a Zotero/BetterBibTex-generated .bib. I guess it is something related to the way Zotero is exporting the .bib, but I couldn't pin-point it. I've found that it sometimes help to delete the comments Zotero leaves at the end of the .bib.
I'm not sure if it is the same issue but I've been having a problem that I'm not sure how to approach debugging. After any change in the .bib file, M-x helm-bibtex
fires up the interface. but after any keyboard input it doesn't show anything and just echoes Parsing bibliography file <filepath>
into the minibuffer forever. However, (bibtex-completion-candidates)
is quite fast and after evaluating that M-x helm-bibtex
works fine until the next change in the .bib file. This behaviour happens even if I just run emacs -q
.
I had the same problem that anghyflawn described and never could figure out why that happened. At the end I switched to ivy-bibtex which works with no issues.
Thanks @jmburgos, just to confirm that in my setup ivy-bibtex works with no issues, so it seems to be a problem on the helm side of things?
@anghyflawn I think this is a separate issue. Strange though because ivy-bibtex and helm-bibtex are sharing most of their code. I never experienced this problem. A reproducible example would be helpful.
Thanks! I have tried with a minimal setup (emacs -q
, (require 'helm-bibtex)
) and a short .bib
file and it works, so I'm assuming it's something in my actual .bib
file. The file has just over 2,800 entries; I've tried bisecting it but not yet found a consistent pattern of when it works and when it doesn't. I'll keep trying. (The file is here)
@anghyflawn, parsing your bib file takes less than a second in my emacs, so it's probably not about the file but your configuration. Perhaps start with a minimal helm-bibtex configuration and then add your customizations step-by-step to see at which point it slows down.
Interesting bibliography, by the way.
OK, so I have been playing around with it and it seems to me there's some intermittent miscommunication between the cache and helm. I have managed to build up from emacs -q
to my full helm-bibtex
configuration (I used helm defaults, even though I don't in real life) without running into this issue. However, having got there I then made some edits to the .bib
file without changing the configuration, and the problem recurred (and ivy-bibtex
still works fine, i.e. it does reread the bibliography). It looks to me specifically like calling helm-bibtex
triggers a rereading of the bibliography but for some reason that doesn't feed back through to helm
.
Interesting bibliography, by the way.
Heh, I wonder how many other helm-bibtex
-using linguists there are :)
Thank you, @anghyflawn, for reporting back. It's certainly useful to know that the problem does not occur with ivy-bibtex but overall the issue seems even more mysterious now. The reason is that in my config reading your bibliography (with helm-bibtex) is really fast even the first time which should be the worst-case scenario. I always suspected that parsing the file field was the issue but the fact that ivy-bibtex is fast speaks against that. Hm ... Could you please try setting bibtex-completion-pdf-field
to nil
and then rereading the bibliography via C-u M-x helm-bibtex
(C-u
clears the cache)? If this takes a long time, we can rule the file field out as a potential source of this problem.
Heh, I wonder how many other helm-bibtex-using linguists there are :)
Quite a few actually. Linguists have quite a strong presence in the Emacs community.
bibtex-completion-pdf-field
being nil doesn't seem to help, I'm afraid. However, your idea to call it with a prefix argument has allowed me to isolate what I think is the problem. If I call it with C-u
it works as expected, i.e. rebuilds the cache and then launches helm, which works normally. Weirdly, every once in a while, I do get the same behaviour if I call helm-bibtex
without the prefix argument but with an invalid cache (i.e. after an edit). The problem appears if the helm interface pops up before the parsing of the bibliography. It looks like there's some sort of weird race condition to me — if the helm interface gets ahead of the parsing, then some kind of blocking occurs, but if the parsing either does get ahead of helm, or you force it to do so via the prefix argument, everything works. Does that make sense? (This being a helm issue would also explain why ivy-bibtex doesn't have the same problem).
For the record this is my emacs version (this is from the Arch repos): GNU Emacs 25.2.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.22.10) of 2017-04-22
Interesting. Does Emacs freeze when the race condition strikes? I experienced this with a couple of helm sources recently but not with helm-bibtex.
No, emacs remains completely responsive.
I'm following this issue, since I am experiencing the same problem (freezing of the helm-bibtex interface, not emacs) when the bibliography is has to be parsed initially (subsequently it works fine, the problem recurs when I restart emacs or when the .bib file is changed and needs reparsing). Forcing re-parse with C-u seems to solve this, as @anghyflawn reports (reparsing takes about 7 seconds for 900+ items)
@tgrigera thanks for reporting. 7 seconds is excessive for ~900 items, it should take around 1s on a recent computer. Unfortunately, I still can't reproduce this problem which makes it very hard for me to pin down what's causing this. A minimal reproducible example would be great.
@tmalsburg I know. I've tried producing a minimal .bib with the problem but have failed. Several times it happened that deleting a particular entry seemed to solve the problem, but then the entry on its own .bib worked perfectly. I'll report any news. In the meantime, 7s is bearable and allows me to do my work. I was afraid I would loose helm-bibtex due to this issue, but with this temporary solution I can go on, which is great because I find this package so useful.
@tgrigera, have you tried switching to ivy-bibitex? The functionality is the same, and I experience no lags.
@jmburgos I've never used ivy, and I'm not quite ready to try a new completion package (I haven't even mastered helm yet)
I have similarly been trying and failing to construct a reproducible example from my bibliography, but the recurrence of the problem has been essentially random. One additional generalization that I seem to be able to make is that the reparsing takes more and more and time the longer I run emacs (I run it as a daemon and rarely switch my laptop off, so I can have pretty long sessions). I do wonder if it's something we should ask about over at helm, since all the bibliography code seems to work fine with ivy?
@tgrigera, the "minimal" in minimal reproducible example is not referring to a minimal bib-file but to a minimal emacs configuration that exhibits the problem. If it is triggered by a race condition, you likely need a larger bibliography in order to trigger the problem.
Re ivy-bibtex, my vague memory is that it is missing some features. More generally, I really like the helm framework because it is so powerful. Ivy in my view is basically reinventing the wheel. Nothing wrong with that, but I prefer the more mature framework.
Just to report that in the latest versions (currently 20170929.1253) my problem seems to have gone away. Helm version 20170928.2056, Emacs version GNU Emacs 25.3.1 (x86_64-pc-linux-gnu, GTK+ Version 3.22.19) of 2017-09-16 on Arch Linux.
@anghyflawn, that's awesome. Thanks for reporting. We didn't make any relevant changes (or did we?), so I assume that the actual problem was somewhere outside helm-bibtex.
@mclearc can you confirm that the problem is solved?
@tmalsburg there is still a slight delay (6-10 secs) on the first startup of helm-bibtex. And I can't use bibtex-completion-pdf-field
. But seems to work satsifactorily so I think I can close the issue if others no longer have any problems.
Hm, 6-10s still seems too slow assuming your bibliography still has about 4200 entries. However, I tried it again with the bibliography that you sent me a while ago and on my system loading it takes about 3 seconds which is reasonable for a bibliography that size.
This bib file is about 25 MB.
What is the reasonable loading time for file of this size?
It took me a minutes or more.
I'm on Emacs' native-comp branch and my 2MB bibliography loads in less than a second. Based on that I'd expect 10s or so for 25MB but I haven't tried it. Also note that helm-bibtex uses caching, so the load should only happen once, and subsequent searches should be much faster (until you change the bibliography).
Parsing is done in Elisp by the package parsebib. The biggest room for improvements might be there. On my side, time is primarily spent for finding PDFs and notes. Do you have a lot of PDFs? And if yes, how are they linked? I'm using the naming scheme BibTeX-key.pdf
(not the file
field) which is computationally lighter.
Sorry, forget the link to the bib. There are many crossrefs which I didn't include in my bib searching path.
One possible improvement (which probably already existing) is to cache via permanent binary, which will boost the loading.
I don't have a lot pdfs (<30) now. I use the BibTeX-key.pdf as well.
Just tested with crypto.bib and it took 15s to parse it. But resolving crossreferences took just 1-2s, so it's the raw parsing that consumes most of the time.
In my experience, native-comp Emacs is approximately 3 times faster than ordinary Emacs. So something around a minute seems plausible.
In the second run (with caching) it takes less than 0.5s to load.
Caching on disk is not implemented yet but might make sense for users of really large bibliographies. PR welcome.
In the meantime persistent caching for helm is possible with e.g. psession
Wow, didn't know about psession. My computer just crashed and I wish I would have been able to restore my session. :)
Thierry is doing such amazing work for the Emacs ecosystem! Last week I decided to make a donation to support him and his work.
I wanted to add another data point to being on the slow gang. Like what's been said already, I've noticed that parse times can be quite variable -- 10s - >100s (sometimes even longer but I get frustrated and end up restarting wsl2 which I hope help but probably doesn't actually). Don't know how to tell how many entries my .bib
file has, but it does have 34k lines (is that long?). I'm using ivy-bibtex
.
There is no reason why the parse times should be variable. 10s to 100s also seems excessively long. For instance, crypto.bib (35k+ entries) loads in 7s on my system. And after the first read it takes just 0.1s (thanks to our caching mechanism). My own bibliography is about the same size as yours and takes less than 0.5s to parse. I suspect that there is another problem in your setup. If you share a minimal reproducible example (for emacs -Q
), I can investigate.
Question: What is wsl2?
Here is code for testing:
(require 'benchmark)
(bibtex-completion-clear-cache)
(benchmark-elapse (bibtex-completion-candidates))
Hi @tmalsburg. Thanks for agreeing to help!
wsl2 is windows subsystems for linux 2.
I'm an emacs noob, so I'm not sure if I'm doing this correctly. I tried to run your code on the *scratch*
buffer and could only find an output in the *Messages*
buffer.
For my existing config with doomemacs
:
Parsing bibliography file zotLib.bib ...
Resolving cross-references ...
Done (re)loading bibliography.
24.3117471
This is one of the faster runs and probably because I just restarted my PC. However, this number is nothing near your 7 seconds for 35k entries record, suggesting that it could be improved.
For the fresh install, I launched emacs using the command you specified emacs -Q
and ran the following commands on the *scratch*
buffer
(require 'package)
;; Any add to list for package-archives (to add marmalade or melpa) goes here
(add-to-list 'package-archives
'("MELPA" .
"http://melpa.org/packages/"))
(package-initialize)
I then manually installed the package:
M-x package-refresh-contents
M-x package-install RET ivy-bibtex
After which, I ran the same test.
(setq bibtex-completion-bibliography "zotLib.bib")
(require 'ivy-bibtex)
(require 'benchmark)
(bibtex-completion-clear-cache)
(benchmark-elapse (bibtex-completion-candidates))
For this "fresh install"
Parsing bibliography file zotLib.bib ...
Resolving cross-references ...
Done (re)loading bibliography.
2.1353558
Looks decent to me! So it does looks like my current config slows it down. That's interesting, but also I don't know how to improve the timing. Any ideas?
Also, I wanted to ask a question about the caching. Does it survive across sessions (i.e. after I M-x kill-emacs
and run it again)? Using ivy-bibtex
subsequent times after the biblio was parsed takes no time at all, but parsing happens every time I restart emacs.
You didn't say how many entries your bibliography has, but 2s doesn't look unexpected. Difficult to tell what's causing the slowdown in your full setup. You'll have to debug it, i.e. incrementally commenting out parts of your config and see when it slows down.
Re caching: Caching is in memory and therefore needs to be redone in every new session. Storing the cache on disk is likely not worth the effort given that typical bibliographies should load in 1-3 seconds.
You didn't say how many entries your bibliography has, but 2s doesn't look unexpected. Difficult to tell what's causing the slowdown in your full setup. You'll have to debug it, i.e. incrementally commenting out parts of your config and see when it slows down.
Re caching: Caching is in memory and therefore needs to be redone in every new session. Storing the cache on disk is likely not worth the effort given that typical bibliographies should load in 1-3 seconds.
Okay, understood. Thanks!
In case it helps other people: I was stumbling upon this issue, caused by "The other issue likely is that you're referencing PDFs via the file field which is know to be slow" https://github.com/tmalsburg/helm-bibtex/issues/159#issuecomment-259631194
I have my references in Zotero and export them using Better BibTex. To try to minimize load time now I create a modified bib file (which is regenerated whenever the bib from Zotero changes ---using inotifywait). This modified bib file does not use the file field for PDFs; instead, I create dummy pdf file names that conform to the bibtexkey.pdf (or bibtexkey-1.pdf, bibtexkey-2.pdf, ...), that live in a scratch directory, and these dummy files sym link to the original PDFs. This scratch directory is also the bibtex-completion-library-path
. This way, I can avoid having helm-bibtex referencing PDFs via the file field (i.e, I can (setq bibtex-completion-pdf-field nil)
, which is the default) . ~This is not perfect, since I loose all the additional name context (e.g., someone2000-suppl-mat.pdf), but n~ Now helm-bibtex loads a lot faster (6 seconds vs. the original 20).
The code uses R. I only use Linux, and make use of symlinks and the whole setup works because I set up a watch (with inotifywait) in the original bib file; I have no idea how to modify it to run in macs or Windows.
Please, make sure to read the comments before trying the code: I use at least one potentially destructive operation.
Link to the code: https://gist.github.com/rdiaz02/21253f2bf00500146c307612d57254c3
Edit 1: the name of the file is now bibtex key + filename, as per suggestion of @tmalsburg (see next comment); I have stricken through the original sentence that no longer applies.
This is not perfect, since I loose all the additional name context (e.g., someone2000-suppl-mat.pdf), but now helm-bibtex loads a lot faster (6 seconds vs. the original 20).
You can keep the additional name context. Just prepend the bibtex key. helm-bibtex will find all PDF whose name starts with the bibtex key.
Thanks a lot for the suggestion!
My naming of files is very variable and inconsistent ("suppl-mat-someone-2020.pdf", "somethingSupplMat.pdf", etc, etc) but if I can just prepend the bibtex key to the file name, then it will be solved. I'll try to change the code and report here.
This may not be a bug, but I have a moderately long bib file (about 4200 entries), which takes several minutes to parse on initial start-up of helm-bibtex. Any suggestions about how this might get sped up? It may be relevant that I'm using Zotero and Better Bib(La)TeX to generate the bib file. Thanks.