Closed narayanibarve closed 1 year ago
In some of the .bib-files I have encountered the error was caused by a single long field containing > 10000 characters. Also see #14.
Anything happening here? I have the error as well and would really like to read the references into R.
Or are there any alternatives? I can use scan
to read the file in, x <- scan(file=bibfile, multi.line = TRUE, sep = "\n", what = "character")
followed by a x <- trimws(x)
, but what than?
How could I parse this object?
Can you prepare a reprex ?
I am using Python for the task now. I had to adapt the workflow a bit, but now it works; and I am learning some python in parallel.
@narayanibarve do you still have this problem ? If so can you prepare a reproducible example using the reprex
package.
Here's a reprex for a case of a long field causing flex
to break:
bibtex::read.bib("long_field.txt")
#> Error: lex fatal error:
#> input buffer overflow, can't enlarge buffer because scanner uses REJECT
I used the current development version of bibtex
from this repository.
Similarly, some reference managers (in this case Zotero) add a jabref comment to the bottom of the file, which causes the same error.
bibtex::read.bib("jabref_comment.txt")
#> Error: lex fatal error:
#> input buffer overflow, can't enlarge buffer because scanner uses REJECT
Thanks. I'll have a look for the next version
Just wanted to add to this that I'm having a similar problem reading in the attached .bib file from WoS.
This cleans the BibTex comments, for anybody else dealing with this:
### First read file to remove the JabRef comment
cleanFile <- readLines(file.path(queryHitsPath, queryHitsFiles));
### Paste all strings together
cleanFile <- paste(cleanFile, collapse="\n");
### Remove jabref comments
cleanFile <- gsub("(?s)@[Cc]omment\\{jabref-meta:[^\\}]*\\}", "", cleanFile, perl=TRUE);
### Write clean file to disk
writeLines(cleanFile, con=file.path(queryHitsPath, "tmp-clean-file.bib"));
### Import references
queryHits[['1and2']] <- ReadBib(file.path(queryHitsPath, "tmp-clean-file.bib"));
However, for some reason it still fails to import, despite no field having even close to 10K characters in it. So there seem to be other errors, as well. Perhaps simply allowing one to specify a string to parse, and thereby letting people import the files on their own, can be a simple, relatively quick fix? Plus, would add functionality that can more generically be useful, so it wouldn't even be lost functionality once this bug (if it is once :-)) has been resolved :-)
I'm no closer to solving this, but I remembered I'd actually written 'my own' function to import BibTex files, for a package I'm working on ('metabefor'). It's at https://github.com/Matherion/metabefor/blob/master/R/importBibtex.r, in case anybody's struggling with the same.
Any news on this?
Something new on this? I had the same error using both bitex
and RefManageR
packages, and using citr
addin.
My try:
download.file(url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = "library.bib")
bibtex::read.bib(file = "library.bib")
RefManageR::ReadBib(file = "library.bib")
My session info:
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_1.0.5.9000 Cite_0.1.0 rcrossref_0.8.1.9429
[4] wordcountaddin_0.2.0 citr_0.2.0.9055 pacman_0.4.6
[7] knitr_1.20 picante_1.6-2 nlme_3.1-131
[10] brranching_0.2.0 phytools_0.6-44 maps_3.2.0
[13] data.table_1.10.4-3 flora_0.3.0 readxl_1.0.0
[16] ape_5.0 betapart_1.5.0 forcats_0.3.0
[19] stringr_1.3.0 dplyr_0.7.4 purrr_0.2.4
[22] readr_1.1.1 tidyr_0.8.0 tibble_1.4.2
[25] ggplot2_2.2.1 tidyverse_1.2.1 vegan_2.4-6
[28] lattice_0.20-35 permute_0.9-4 bibtex_0.4.2
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 rprojroot_1.3-2 rstudioapi_0.7
[4] urltools_1.7.0 DT_0.4 mvtnorm_1.0-7
[7] lubridate_1.7.3 RefManageR_0.14.20 xml2_1.2.0
[10] codetools_0.2-15 splines_3.4.3 mnormt_1.5-5
[13] bold_0.5.0 jsonlite_1.5 broom_0.4.3
[16] cluster_2.0.6 compiler_3.4.3 httr_1.3.1
[19] backports_1.1.2 assertthat_0.2.0 Matrix_1.2-12
[22] lazyeval_0.2.1 cli_1.0.0 later_0.7.1
[25] htmltools_0.3.6 tools_3.4.3 bindrcpp_0.2
[28] igraph_1.1.2 coda_0.19-1 gtable_0.2.0
[31] glue_1.2.0 taxize_0.9.3 reshape2_1.4.3
[34] clusterGeneration_1.3.4 fastmatch_1.1-0 Rcpp_0.12.16
[37] msm_1.6.6 cellranger_1.1.0 crul_0.5.2
[40] debugme_1.1.0 iterators_1.0.9 psych_1.7.8
[43] rvest_0.3.2 mime_0.5 miniUI_0.1.1
[46] phangorn_2.4.0 devtools_1.13.5 stringdist_0.9.4.7
[49] MASS_7.3-49 zoo_1.8-1 scales_0.5.0
[52] rcdd_1.2 hms_0.4.2 promises_1.0
[55] parallel_3.4.3 expm_0.999-2 animation_2.5
[58] yaml_2.1.18 curl_3.2 memoise_1.1.0
[61] triebeard_0.3.0 reshape_0.8.7 stringi_1.1.7
[64] foreach_1.4.4 plotrix_3.7 geometry_0.3-6
[67] rlang_0.2.0 pkgconfig_2.0.1 evaluate_0.10.1
[70] bindr_0.1.1 htmlwidgets_1.0 plyr_1.8.4
[73] magrittr_1.5 R6_2.2.2 combinat_0.0-8
[76] whisker_0.3-2 pillar_1.2.1 haven_1.1.1
[79] foreign_0.8-69 withr_2.1.2 mgcv_1.8-23
[82] survival_2.41-3 scatterplot3d_0.3-41 abind_1.4-5
[85] modelr_0.1.1 crayon_1.3.4 rmarkdown_1.9
[88] koRpus_0.10-2 grid_3.4.3 callr_2.0.2
[91] reprex_0.1.2 digest_0.6.15 xtable_1.8-2
[94] httpuv_1.3.6.9007 numDeriv_2016.8-1 munsell_0.4.3
[97] shinyjs_1.0 magic_1.5-8 quadprog_1.5-5
The funny thing is that the code works using the reprex
addin.
download.file(url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = "library.bib")
bibtex::read.bib(file = "library.bib")
#> Vellend M (2001). "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?" _Journal of Vegetation Science_, *12*,
#> pp. 545-552.
#>
#> López-Mart\'inez JO, Sanaphre-Villanueva L, Dupuy JM,
#> Hernández-Stefanoni JL, Meave JA and Gallardo-Cruz JA (2013).
#> "$\beta$-Diversity of functional groups of woody plants in a
#> tropical dry forest in Yucatan." _PloS one_, *8*(9), pp. e73660.
#> ISSN 1932-6203, doi: 10.1371/journal.pone.0073660 (URL:
#> http://doi.org/10.1371/journal.pone.0073660), <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> Swenson NG, Stegen JC, Davies SJ, Erickson DL, Forero-Montaña J,
#> Hurlbert AH, Kress WJ, Thompson J, Uriarte M, Wright SJ and
#> Zimmerman JK (2012). "Temporal turnover in the composition of
#> tropical tree communities: functional determinism and phylogenetic
#> stochasticity." _Ecology_, *93*(3), pp. 490-499. ISSN 0012-9658,
#> doi: 10.1890/11-1180.1 (URL: http://doi.org/10.1890/11-1180.1),
#> <URL: http://doi.wiley.com/10.1890/11-1180.1>.
RefManageR::ReadBib(file = "library.bib")
#> Warning in parse_Rd(Rd, encoding = encoding, fragment = fragment, ...):
#> <connection>:3: unknown macro '\beta'
#> Warning in parse_Rd(Rd, encoding = encoding, fragment = fragment, ...):
#> <connection>:3: unknown macro '\beta'
#> [1] J. O. López-Mart\'inez, L. Sanaphre-Villanueva, J. M. Dupuy,
#> et al. "$\beta$-Diversity of functional groups of woody plants in
#> a tropical dry forest in Yucatan.". In: _PloS one_ 8.9 (Jan.
#> 2013), p. e73660. ISSN: 1932-6203. DOI:
#> 10.1371/journal.pone.0073660. <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> [2] N. G. Swenson, J. C. Stegen, S. J. Davies, et al. "Temporal
#> turnover in the composition of tropical tree communities:
#> functional determinism and phylogenetic stochasticity". In:
#> _Ecology_ 93.3 (Mar. 2012), pp. 490-499. ISSN: 0012-9658. DOI:
#> 10.1890/11-1180.1. <URL: http://doi.wiley.com/10.1890/11-1180.1>.
#>
#> [3] M. Vellend. "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?". In: _Journal of Vegetation Science_
#> 12 (2001), pp. 545-552.
I've been reading bib files with readFiles
in the bibliometrix package.
Hi,
I am using citr and Rmarkdown with Zotero. I partially got around this problem with crsh's suggestion of omitting abstract, but some bibtex entries have 500/1000+ author names, that reproduces the problem.
Any suggestions, has anyone come around with a solution to this?
I have the same problem with Rmarkdown and citr. Any suggested solution for this please ?
I am having this issue for parsing a long list of authors too. Any progress?
Hi, I think this issue may be closed after #47
I parsed all your example files with the upcoming version of bibtex
, where the C code is replaced by R code and the described issue is not observed anymore. The files are read accodingly:
# PR 47 https://github.com/ropensci/bibtex/pull/47
library(bibtex)
# File 1 ----
f1 <- tempfile("file1", fileext = ".txt")
download.file(
"https://github.com/romainfrancois/bibtex/files/1120203/long_field.txt",
f1
)
ex1 <- read.bib(f1)
ex1
#> Batzill M (2012). "The Surface Science of Graphene: Metal Interfaces,
#> CVD Synthesis, Nanoribbons, Chemical Modifications, and Defects."
#> _SURFACE SCIENCE REPORTS_, *67*(3-4), 83-115. ISSN 0167-5729, doi:
#> 10.1016/j.surfrep.2011.12.001 (URL:
#> https://doi.org/10.1016/j.surfrep.2011.12.001).
# File 2 ----
f2 <- tempfile("file2", fileext = ".txt")
download.file(
"https://github.com/romainfrancois/bibtex/files/1120229/jabref_comment.txt",
f2
)
ex2 <- read.bib(f2)
ex2
#> Gómez RL (2002). "Variability and Detection of Invariant Structure."
#> _Psychological Science_, *13*(5), 431-436. ISSN 0956-7976, 1467-9280,
#> doi: 10.1111/1467-9280.00476 (URL:
#> https://doi.org/10.1111/1467-9280.00476), <URL: 2015-01-20>.
# File 3 -----
f3 <- tempfile("file3", fileext = ".zip")
download.file(
"https://github.com/romainfrancois/bibtex/files/1229495/soil.health_healthy.soil_1to500.bib.zip",
f3
)
unzip(f3, junkpaths = TRUE, exdir = tempdir())
ex3 <- read.bib(
file.path(
tempdir(),
"soil.health_healthy.soil_1to500.bib"
)
)
#> ignoring entry 'ISI:000268383100002' (line 34779) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100003' (line 34853) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100004' (line 34928) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100005' (line 34999) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100006' (line 35080) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100008' (line 35134) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100010' (line 35192) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
length(ex3)
#> [1] 493
# Small sample of entries, since the file has 500 (493 read)
ex3[1:5]
#> FORMAN J (1951). "SOIL, HEALTH, AND THE DENTAL PROFESSION." _JOURNAL OF
#> PROSTHETIC DENTISTRY_, *1*(5), 508-522. ISSN 0022-3913, doi:
#> 10.1016/0022-3913(51)90037-6 (URL:
#> https://doi.org/10.1016/0022-3913(51)90037-6).
#>
#> SHARMA N, MADAN M (1983). "EARTHWORMS FOR SOIL HEALTH AND
#> POLLUTION-CONTROL." _JOURNAL OF SCIENTIFIC \& INDUSTRIAL RESEARCH_,
#> *42*(10), 575-583. ISSN 0022-4456.
#>
#> HABERERN J (1992). "A SOIL HEALTH INDEX." _JOURNAL OF SOIL AND WATER
#> CONSERVATION_, *47*(1), 6. ISSN 0022-4561.
#>
#> [Anonymous] (1993). "THE BREAD CORNER - NO BREAD WITHOUT HEALTHY SOIL."
#> _ALIMENTA_, *32*(3), 45. ISSN 0002-5402.
#>
#> Watts M (1994). "Pesticide residues in food: The views of the Soil \&
#> Health Association of New Zealand." In Savage, GP (ed.), _PROCEEDINGS
#> OF THE NUTRITION SOCIETY OF NEW ZEALAND, VOL 19_, volume 19 number 0
#> series PROCEEDINGS OF THE NUTRITION SOCIETY OF NEW ZEALAND, 58-63. Nutr
#> Soc New Zealand, ANIMAL \& VETERINARY SCI GROUP, LINCOLN UNIVERSITY, PO
#> BOX 84, CANTERBURY, NEW ZEALAND. 29th Annual Conference of the
#> Nutrition-Society-of-New-Zealand, CHRISTCHURCH, NEW ZEALAND, AUG, 1994.
# From gist ----
gist <- tempfile(fileext = ".bib")
download.file(
url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = gist
)
bibtex::read.bib(file = gist)
#> Vellend M (2001). "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?" _Journal of Vegetation Science_, *12*,
#> 545-552.
#>
#> López-Mart\'inez JO, Sanaphre-Villanueva L, Dupuy JM,
#> Hernández-Stefanoni JL, Meave JA, Gallardo-Cruz JA (2013).
#> "$\beta$-Diversity of functional groups of woody plants in a tropical
#> dry forest in Yucatan." _PloS one_, *8*(9), e73660. ISSN 1932-6203,
#> doi: 10.1371/journal.pone.0073660 (URL:
#> https://doi.org/10.1371/journal.pone.0073660), <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> Swenson NG, Stegen JC, Davies SJ, Erickson DL, Forero-Montaña J,
#> Hurlbert AH, Kress WJ, Thompson J, Uriarte M, Wright SJ, Zimmerman JK
#> (2012). "Temporal turnover in the composition of tropical tree
#> communities: functional determinism and phylogenetic stochasticity."
#> _Ecology_, *93*(3), 490-499. ISSN 0012-9658, doi: 10.1890/11-1180.1
#> (URL: https://doi.org/10.1890/11-1180.1), <URL:
#> http://doi.wiley.com/10.1890/11-1180.1>.
Created on 2022-01-17 by the reprex package (v2.0.1)
This error when I read .bib file. First I thought it happens because file is huge, with something like 5000 citations, so I exported only 4 citations from this set in bibtex format in a .bib format file. But even this 4 citations files does not work. I get the same error.