ropensci / bib2df

Parse a BibTeX file to a tibble
https://docs.ropensci.org/bib2df
99 stars 22 forks source link

Parsing .bib fails when field separator is on the next line #56

Open cwverhey opened 1 year ago

cwverhey commented 1 year ago

bib2df::bib2df() fails to load fields when the field separator (",") is preceded by a newline, as in the following example:

@article{SHBP
,title = "Efficient DC Analysis of RVJ Circuits for Moment and Derivative Commutations of Interconnect Networks"
,author = " S. H. Batterywala and H. Narayanan "
,journal = "12th International Conference on VLSI Design"
,pages = "169-174"
,year = 1999
}

reprex:

f <- tempfile()
download.file('https://www.ee.iitb.ac.in/~trivedi/LatexHelp/Docs/ref.bib', f)
bib2df::bib2df(f)

With version 1.1.1 it loads in new columns "X.≪fieldname≫":

# A tibble: 9 × 41
  CATEGORY    BIBTE…¹ ADDRESS ANNOTE AUTHOR BOOKT…² CHAPTER CROSS…³ EDITION EDITOR HOWPU…⁴ INSTI…⁵ JOURNAL KEY   MONTH NOTE  NUMBER ORGAN…⁶
  <chr>       <chr>   <chr>   <chr>  <list> <chr>   <chr>   <chr>   <chr>   <list> <chr>   <chr>   <chr>   <chr> <chr> <chr> <chr>  <chr>  
1 ARTICLE     SHBP    NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
2 ARTICLE     SIE     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
3 BOOK        HN      NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
4 BOOK        DON     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
5 MASTERSTHE… GAK     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
6 MASTERSTHE… GT      NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
7 MASTERSTHE… NJB     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
8 MANUAL      PVM     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
9 MISC        PVMS    NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
# … with 23 more variables: PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <chr>,
#   X.TITLE <chr>, X.AUTHOR <chr>, X.JOURNAL <chr>, X.PAGES <chr>, X.YEAR <chr>, X.VOLUME <chr>, X.NUMBER <chr>, X.PUBLISHER <chr>,
#   X.MONTH <chr>, X.SCHOOL <chr>, X.ORGANIZATION <chr>, X.ADDRESS <chr>, X.NOTE <chr>, X.KEY <chr>, X.HOWPUBLISHED <chr>, and abbreviated
#   variable names ¹​BIBTEXKEY, ²​BOOKTITLE, ³​CROSSREF, ⁴​HOWPUBLISHED, ⁵​INSTITUTION, ⁶​ORGANIZATION

With version 1.1.2 it doesn't load at all (all values are either NA, character(0) or an empty string):

# A tibble: 9 × 26
  CATEGORY    BIBTE…¹ ADDRESS ANNOTE AUTHOR BOOKT…² CHAPTER CROSS…³ EDITION EDITOR HOWPU…⁴ INSTI…⁵ JOURNAL KEY   MONTH NOTE  NUMBER ORGAN…⁶
  <chr>       <chr>   <chr>   <chr>  <list> <chr>   <chr>   <chr>   <chr>   <list> <chr>   <chr>   <chr>   <chr> <chr> <chr> <chr>  <chr>  
1 ARTICLE     SHBP    NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      ""      NA    NA    NA    NA     NA     
2 ARTICLE     SIE     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      ""      NA    NA    NA    ""     NA     
3 BOOK        HN      NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
4 BOOK        DON     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    NA    NA    NA     NA     
5 MASTERSTHE… GAK     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    ""    NA    NA     NA     
6 MASTERSTHE… GT      NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    ""    NA    NA     NA     
7 MASTERSTHE… NJB     NA      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    ""    NA    NA     NA     
8 MANUAL      PVM     ""      NA     <chr>  NA      NA      NA      NA      <chr>  NA      NA      NA      NA    ""    ""    NA     ""     
9 MISC        PVMS    NA      NA     <chr>  NA      NA      NA      NA      <chr>  ""      NA      NA      ""    NA    NA    NA     NA     
# … with 8 more variables: PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <chr>,
#   and abbreviated variable names ¹​BIBTEXKEY, ²​BOOKTITLE, ³​CROSSREF, ⁴​HOWPUBLISHED, ⁵​INSTITUTION, ⁶​ORGANIZATION

I am not sure how common this is (probably not at all), but this did happen on the first example .bib I found online and it seems like a basic parsing error.