sciunto-org / python-bibtexparser

Bibtex parser for Python 3
https://bibtexparser.readthedocs.io
MIT License
477 stars 132 forks source link

Issue with month field #291

Closed thomashirtz closed 2 years ago

thomashirtz commented 2 years ago

Some users of my repository are using paperpile to get bib files. However there is an issue because the months doesn't include quotes, which breaks the bibtexparser import. example :

@ARTICLE{Elosua-Bayes2021-hv,
  title    = "{SPOTlight: seeded NMF regression to deconvolute spatial
              transcriptomics spots with single-cell transcriptomes}",
  author   = "Elosua-Bayes, Marc and Nieto, Paula and Mereu, Elisabetta and
              Gut, Ivo and Heyn, Holger",
  abstract = "Spatially resolved gene expression profiles are key to understand
              tissue organization and function. However, spatial
              transcriptomics (ST) profiling techniques lack single-cell
              resolution and require a combination with single-cell RNA
              sequencing (scRNA-seq) information to deconvolute the spatially
              indexed datasets. Leveraging the strengths of both data types, we
              developed SPOTlight, a computational tool that enables the
              integration of ST with scRNA-seq data to infer the location of
              cell types and states within a complex tissue. SPOTlight is
              centered around a seeded non-negative matrix factorization (NMF)
              regression, initialized using cell-type marker genes and
              non-negative least squares (NNLS) to subsequently deconvolute ST
              capture locations (spots). Simulating varying reference
              quantities and qualities, we confirmed high prediction accuracy
              also with shallowly sequenced or small-sized scRNA-seq reference
              datasets. SPOTlight deconvolution of the mouse brain correctly
              mapped subtle neuronal cell states of the cortical layers and the
              defined architecture of the hippocampus. In human pancreatic
              cancer, we successfully segmented patient sections and further
              fine-mapped normal and neoplastic cell states. Trained on an
              external single-cell pancreatic tumor references, we further
              charted the localization of clinical-relevant and tumor-specific
              immune cell states, an illustrative example of its flexible
              application spectrum and future potential in digital pathology.",
  journal  = "Nucleic Acids Res.",
  volume   =  49,
  number   =  9,
  pages    = "e50",
  month    =  may,
  year     =  2021,
  language = "en",
  issn     = "0305-1048, 1362-4962",
  pmid     = "33544846",
  doi      = "10.1093/nar/gkab043",
  pmc      = "PMC8136778"
}

error:

  File "/mnt/d/Thomas/GitHub/notion-scholar/notion_scholar/run.py", line 37, in run
    bib_database: BibDatabase = get_bib_database_from_file(file_path=bib_file_path)
  File "/mnt/d/Thomas/GitHub/notion-scholar/notion_scholar/bibtex.py", line 14, in get_bib_database_from_file
    return load(bibtex_file)
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/__init__.py", line 69, in load
    return parser.parse_file(bibtex_file)
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bparser.py", line 169, in parse_file
    return self.parse(file.read(), partial=partial)
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bparser.py", line 147, in parse
    self._expr.parseFile(bibtex_file_obj)
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibtexexpression.py", line 278, in parseFile
    return self.main_expression.parseFile(file_obj, parseAll=True)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 1893, in parse_file
    return self.parse_string(file_contents, parseAll)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 1117, in parse_string
    loc, tokens = self._parse(instring, 0)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 807, in _parseNoCache
    loc, tokens = self.parseImpl(instring, pre_loc, doActions)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 4851, in parseImpl
    return super().parseImpl(instring, loc, doActions)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 4760, in parseImpl
    loc, tmptokens = self_expr_parse(instring, preloc, doActions)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 807, in _parseNoCache
    loc, tokens = self.parseImpl(instring, pre_loc, doActions)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 4074, in parseImpl
    return e._parse(
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 844, in _parseNoCache
    tokens = fn(instring, tokens_start, ret_tokens)
  File "/home/thomas/.local/lib/python3.8/site-packages/pyparsing/core.py", line 283, in wrapper
    ret = func(*args[limit:])
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bparser.py", line 187, in <lambda>
    lambda s, l, t: self._add_entry(
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bparser.py", line 277, in _add_entry
    d[self._clean_field_key(key)] = self._clean_val(fields[key])
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bparser.py", line 228, in _clean_val
    return as_text(val)
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibdatabase.py", line 270, in as_text
    return text_string_or_expression.get_value()
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibdatabase.py", line 231, in get_value
    return ''.join([BibDataString.expand_string(s) for s in self.expr])
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibdatabase.py", line 231, in <listcomp>
    return ''.join([BibDataString.expand_string(s) for s in self.expr])
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibdatabase.py", line 197, in expand_string
    return string_or_bibdatastring.get_value()
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibdatabase.py", line 178, in get_value
    return self._bibdatabase.expand_string(self.name)
  File "/home/thomas/.local/lib/python3.8/site-packages/bibtexparser/bibdatabase.py", line 109, in expand_string
    raise(UndefinedString(name))
bibtexparser.bibdatabase.UndefinedString: 'may'

Code:

from bibtexparser import load
from bibtexparser.bibdatabase import BibDatabase

def get_bib_database_from_file(file_path: str) -> BibDatabase:
    with open(file_path) as bibtex_file:
        return load(bibtex_file)

Is there a way to fix that or sanitize that ? Or I need to do a report to paperpile and errors needs to be dealt with manually ?

agruber commented 2 years ago

Andreas from Paperpile here. We export "month" as BibTeX macro names and not as strings, since that will expand to the right string for current language settings when compiling BibTeX. I am afraid there is nothing we can do on our side, since this is valid BibTeX.

thomashirtz commented 2 years ago

Andreas from Paperpile here. We export "month" as BibTeX macro names and not as strings, since that will expand to the right string for current language settings when compiling BibTeX. I am afraid there is nothing we can do on our side, since this is valid BibTeX.

Sorry about targeting paperpile, I didn't know it was the standard before your message 🙈

thomashirtz commented 2 years ago

I just saw that it is the same issue as https://github.com/sciunto-org/python-bibtexparser/issues/280