Open cwverhey opened 2 years ago
I've found a related issue where the parsing fails if there is no newline.
bib <- "@Article{Binmore2008, Title = {Do Conventions Need to Be Common Knowledge?}, Author = {Binmore, Ken}, Journal = {Topoi}, Year = {2008}, Number = {1}, Pages = {17--27}, Volume = {27}}"
t <- tempfile()
writeLines(bib, t)
df <- bib2df::bib2df(t)
# A tibble: 1 × 27
CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION EDITOR HOWPUBLISHED INSTITUTION JOURNAL
<chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr>
1 ARTICLE Rangel_2023 NA NA <chr> NA NA NA NA <chr> NA NA NA
# ℹ 14 more variables: KEY <chr>, MONTH <chr>, NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>, PAGES <chr>,
# PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <chr>, ARTICLE <chr>
It definitely did not do this previously, I'm returning to an old project and none of the code is working correctly :(
I think this can be fixed with
bib <- "@Article{Binmore2008, Title = {Do Conventions Need to Be Common Knowledge?}, Author = {Binmore, Ken}, Journal = {Topoi}, Year = {2008}, Number = {1}, Pages = {17--27}, Volume = {27}}"
t <- tempfile()
writeLines(gsub(",",",\n",bib), t)
df <- bib2df::bib2df(t)
which gives
df
# A tibble: 1 × 26
CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION EDITOR HOWPUBLISHED INSTITUTION JOURNAL KEY MONTH NOTE NUMBER ORGANIZATION PAGES PUBLISHER
<chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <list> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 ARTICLE Binmore20… NA NA <chr> NA NA NA NA <chr> NA NA Topoi NA NA NA 1 NA 17--… NA
# ℹ 6 more variables: SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <dbl>
Does this help?
No, that does not work because then the entry can look like this:
@article{Broadman_2020,
title={Coupled impacts of sea ice variability and North Pacific atmospheric circulation on Holocene hydroclimate in Arctic Alaska},
volume={117},
ISSN={1091-6490},
url={http://dx.doi.org/10.1073/PNAS.2016544117},
DOI={10.1073/pnas.2016544117},
number={52},
journal={Proceedings of the National Academy of Sciences},
publisher={Proceedings of the National Academy of Sciences},
author={Broadman,
Ellie and Kaufman,
Darrell S. and Henderson,
Andrew C. G. and Malmierca-Vallet,
Irene and Leng,
Melanie J. and Lacey,
Jack H.},
I've put in a lot of time basically refactoring the entire parsing process, and I do think it is more robust, but I am unfortunately stuck without a good workaround on one of the tests - the one allowing =
in field values. I could potentially move forward if we were to decide on a list of allowed field names, however.
bib2df::bib2df() fails to load fields when the field separator (",") is preceded by a newline, as in the following example:
reprex:
With version 1.1.1 it loads in new columns "X.≪fieldname≫":
With version 1.1.2 it doesn't load at all (all values are either NA, character(0) or an empty string):
I am not sure how common this is (probably not at all), but this did happen on the first example .bib I found online and it seems like a basic parsing error.