jkitchin / org-ref

org-mode modules for citations, cross-references, bibliographies in org-mode and useful bibtex tools to go with it.
GNU General Public License v3.0
1.36k stars 242 forks source link

Org-ref seems to not find entries in my bib file #1118

Closed 00krishna closed 1 month ago

00krishna commented 1 month ago

Hello. I have a relatively large bib file, about 9mb and 7252 entries. I have noticed that when I user ivy to search for an older entry--something that I added a year ago--that ivy or helm do not seem to find that entry in the bib file.

Now I am certain that the entry is in the bib file, because I used KBibtex to check the actual bibtex entries. I have an example of one of the problem bibtex entries below. Is there something that I can do to force org-ref or ivy to search the entire file, instead of just the most recent entries. Sorry if I am saying that it only finds the most recent entries--that is just my experience. It might be that ivy only looks at the first 1000 entries or such, but I am not sure.

Is there a good workaround for this? I imagine many people have large bib files. I could create smaller bib files for each org file, but that starts to get messy and fragile I felt.

Thanks for any suggestions.

Here is an example of a bibtex entry that was not found. The formatting seems okay. This is the Alphafold 3 paper. I tried searching for "Jumper" or "Highly accurate" but no luck.

@article{Jumper2021,
    abstract = {
 Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1–4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6,7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10–14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. 
},
    author = {John Jumper and Richard Evans and Alexander Pritzel and Tim Green and Michael Figurnov and Olaf Ronneberger and Kathryn Tunyasuvunakool and Russ Bates and Augustin Žídek and Anna Potapenko and Alex Bridgland and Clemens Meyer and Simon A. A. Kohl and Andrew J. Ballard and Andrew Cowie and Bernardino Romera-Paredes and Stanislav Nikolov and Rishub Jain and Jonas Adler and Trevor Back and Stig Petersen and David Reiman and Ellen Clancy and Michal Zielinski and Martin Steinegger and Michalina Pacholska and Tamas Berghammer and Sebastian Bodenstein and David Silver and Oriol Vinyals and Andrew W. Senior and Koray Kavukcuoglu and Pushmeet Kohli and Demis Hassabis},
    doi = {10.1038/s41586-021-03819-2},
    issn = {0028-0836},
    issue = {7873},
    journal = {Nature},
    month = {8},
    pages = {583–589},
    title = {Highly accurate protein structure prediction with AlphaFold},
    url = {https://www.nature.com/articles/s41586-021-03819-2},
    volume = {596},
    year = {2021}
}

Thanks again.

jkitchin commented 1 month ago

I haven't had that problem before. I have close to 5000 entries I think. If you run M-x bibtex-validate on the file does it come out clean?

00krishna commented 1 month ago

I tried what you suggested, but the issue persists.

So I used the full file first and did bibtex-validate. The message returned was buffer is syntactically correct. But when I tried to search for the reference mentioned above, no luck.

Then I created a separate smaller bibtex file that contained the reference above. I ran bibtex-validate on it and again received the message buffer is syntactically correct. This time, I was able to find the reference with no problem and I could insert the reference.

Is there another tool you can recommend to revalidate the bibtex file?

So I created the bibtex file in Mendeley and then ran it through Kbibtex to set the citation keys. I tested this a bit and found the follow. I have a Mendeley group for biology. If I created an entry in 2024, that entry gets found with no issue. However, if I created the entry in an earlier year, like 2022, then it sometimes gets found and sometimes does not get found. There is no clear pattern that I can tell.

Is there a variable that counts the number of entries in the bibtex file? From Kbibtex it tells me I have 7,287 entries.

jkitchin commented 1 month ago

M-x bibtex-count-entries will count the entries.

Are you using ivy-bibtex to search these?

00krishna commented 1 month ago

I tried bibtex-count-entries but I got the message buffer contains zero entries. I am not sure why I get that message.

I vaguely remember some message a while ago--when using emacs--about opening my bibtex file "literally" or something to improve performance. I am not sure if that is relevant. I could not find mention of that in the *Messages* buffer.

Yep, I am using ivy-bibtex. Here is a picture of my search. I am looking for that Jumper et al., article, but it does not show up in the results. I included the picture because sometimes you might see something that solves the problem?

Selection_002

I use Spacemacs. So now I am starting to wonder if I should like delete the org-ref folders, and let Spacemacs reinstall them or something?

I also checked the *Messages* buffer for any errors. Things seem okay, but I am not familiar with the message. Here is what I am seeing. There is a mention of unbalanced parenthesis, but the file apparently parsed okay.

Checking syntactical structure (done)
Checking for duplicate keys (done)
Buffer is syntactically correct
Done (re)loading bibliography.
Quit
editorconfig--advice-insert-file-contents: Opening input file: No such file or directory, /home/krishnab/Dropbox/backup/bibtex/library.bib
Parsing bibliography file ~/Dropbox/backup/bibtex/library.bib ...
forward-sexp: Scan error: "Unbalanced parentheses", 2529705, 2674258
Parsing bibliography file ~/Dropbox/backup/bibtex/library.bib ...
Resolving cross-references ...
Done (re)loading bibliography. [3 times]

Sorry, I don't want to take up too much of your time. I am sure you have better things to do :).

jkitchin commented 1 month ago

Thanks for sharing the image. That is not ivy-bibtex (try searching with M-x ivy-bibtex). It is an org-ref function, but I guess it is the vanilla one that uses built in completion.

Try running M-x load-library org-ref-ivy.

That should change change the insert key to an ivy function, and you will see something like this.

image

If you want to send me your bibtex file, I can try it here and see if I see any issues.

These messages:

editorconfig--advice-insert-file-contents: Opening input file: No such file or directory, /home/krishnab/Dropbox/backup/bibtex/library.bib
Parsing bibliography file ~/Dropbox/backup/bibtex/library.bib ...
forward-sexp: Scan error: "Unbalanced parentheses", 2529705, 2674258
Parsing bibliography file ~/Dropbox/backup/bibtex/library.bib ...

suggest something is not quite right.

00krishna commented 1 month ago

Here is a copy of my bibtex file. library.zip

I will keep playing with emacs to see if I can use ivy, as you indicated.

For testing purposes, I also had trouble trying to load this reference besides the Jumper paper.

Torrisi, M., Pollastri, G., & Le, Q. (2020). Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal, 18(), 1301–1310. http://dx.doi.org/10.1016/J.CSBJ.2019.12.011
jkitchin commented 1 month ago

When I run bibtex-validate on your file I get many errors.

image

Most of them seem to be missing keys (500+ of them)

@misc{,
   author = {Ronald Lai},
   title = {disambiguation_of_uspto},
}

There also seem to be a lot of issues with unmatched parentheses in the abstracts.

I deleted all the abstracts, and all the empty key entries, and then I was able to find the jumper entry. After that, I can see the entry you are looking for:

image

this probably means you have some bibtex maintenance to do to get it working I think.

00krishna commented 1 month ago

Oh this is great. Okay, at least I know what the problem is now. I wonder why my bibtex-validate was showing no errors?

Is there a good software for fixing these issues? I have tried Kbibtex, but did not see a way to validate entries there. I am currently trying to import my library into Zotero, but it is taking quite a while. If you have a suggestion that is great, but otherwise I can keep trying different tools. Now I know I have to get the keys fixed and the parentheses fixed.

Wow, thanks so much for your help with this.

jkitchin commented 1 month ago

I don't have great suggestions. M-x bibtex-validate is what I used first, and it gave me a long list of errors. It is tricky though to automate fixing these.

00krishna commented 1 month ago

Okay sounds good. I will see if Zotero can help to resolve these. But I can mark the issue closed since we know the problem know. I really really appreciate your help and attention to this issue. Please keep up the excellent work in building these tools and recording the videos.