ropensci / bib2df

Parse a BibTeX file to a tibble
https://docs.ropensci.org/bib2df
99 stars 22 forks source link

problem with whitespaces around = #53

Closed HedvigS closed 3 months ago

HedvigS commented 2 years ago

I've discovered that when I have an entry like this:

@book{fassberg2019modern,
  title      = {Languages of the Eastern Section: Great Lakes to Indian Ocean},
author={Fassberg, Steven E},
  lgcode={west2763},
  hhtype={overview},
  pages={632652},
  year={2019},
  publisher={Routledge}
}

I get a table that looks like this from bib2df::bib2df()

CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION EDITOR HOWPUBLISHED INSTITUTION JOURNAL KEY MONTH NOTE NUMBER ORGANIZATION PAGES PUBLISHER SCHOOL SERIES TITLE TYPE VOLUME YEAR AUTHOR..FASSBERG. LGCODE..WEST2763.. HHTYPE..OVERVIEW.. PAGES..632652.. YEAR..2019.. PUBLISHER..ROUTLEDGE.
BOOK fassberg2019modern                                         Languages of the Eastern Section: Great Lakes to Indian Ocean       Fassberg, Steven E west2763 overview 632652 2019 Routledge

I've isolated the problem down to the lack of whitespaces before and after the equal sign at the field assignment. It's an easy fix, I basically just inserted whitespaces before and after every equal sign before a curly bracket, but it was a bit frustrating to debug. Can this be included in the documentation, or fixed?

agricolamz commented 2 years ago

I've spent half an hour for figuring out that it was spaces, not the uppercase categories...

HedvigS commented 2 years ago

I've spent half an hour for figuring out that it was spaces, not the uppercase categories...

Haha oh no! I'm sorry!

nucleic-acid commented 2 years ago

Hi, could this be fixed by refining the regular expressions in bib2df_gather.R? Would you accept a pull request on this?

HedvigS commented 1 year ago

Here's a hacky solution for desperate folks in the meantime ^^

https://hedvigsr.tumblr.com/post/702901773084524544/bib2df-bug-hacky-solution

nguyentruonglt commented 11 months ago

I have the same problem. But I have a bibtex file with 3000 citation. It's extremely exhausting to add spaces before and after equal signs (=) manually. Do you know any solution to do it automatically? Do R or any tools support us to do it?

agricolamz commented 11 months ago

The bib-files are plain texts, so you can do with it whatever you want. If I were you, I'd do something like this:

library(tidyverse)

read_lines("your_bib_file.bib") |> 
  str_replace_all("=", " = ") |> # add desired spaces
  str_replace_all("\\s{2,}", " ") |>  # remove double spaces in case you have it
  write_lines("your_bib_file.bib")

I didn't check the code on real files, but I'm pretty confident that it should work.

HedvigS commented 11 months ago

@nguyentruonglt here's my scripted solution:

Here's a hacky solution for desperate folks in the meantime ^^

https://hedvigsr.tumblr.com/post/702901773084524544/bib2df-bug-hacky-solution

HedvigS commented 11 months ago

@nguyentruonglt here's my scripted solution:

Here's a hacky solution for desperate folks in the meantime ^^ https://hedvigsr.tumblr.com/post/702901773084524544/bib2df-bug-hacky-solution

This is the function I used:

add_spaces_for_bib2df <- function(bib_fn){

new_fn <- paste0( str_replace(bib_fn, ".bib", ""), "_sep", ".bib")

  read_lines(bib_fn) %>% 
  str_replace_all(regex("\\=\\{"), regex(" \\= \\{")) %>% 
  write_lines(new_fn)
}
HedvigS commented 3 months ago

@giabaio I'd like to help by adjusting bib2df_gather and adjust one of the regexes and make a PR, like @nucleic-acid suggests. But, I'm struggling a bit with parsing the function and I'm concerned I'd cause problems unknowingly. I've made a suggesting in PR #59