ropensci / bib2df

Parse a BibTeX file to a tibble
https://docs.ropensci.org/bib2df
99 stars 22 forks source link

Support for parsing .bib from scopus #32

Closed benjaminschwetz closed 4 years ago

benjaminschwetz commented 5 years ago

I tried to parse .bib files exported from scopus today but ended up with a total mess of column names (see below).

bib_string <- "@ARTICLE{Brulc20091948,
author={Brulc, J.M. and Antonopoulos, D.A. and Berg Miller, M.E. and Wilson, M.K. and Yannarell, A.C. and Dinsdale, E.A. and Edwards, R.E. and Frank, E.D. and Emerson, J.B. and Wacklin, P. and Coutinho, P.M. and Henrissat, B. and Nelson, K.E. and White, B.A.},
title={Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases},
journal={Proceedings of the National Academy of Sciences of the United States of America},
year={2009},
doi={10.1073/pnas.0806191105},
url={https://www.scopus.com/inward/record.uri?eid=2-s2.0-60549114321&doi=10.1073%2fpnas.0806191105&partnerID=40&md5=8d70a27545328d4cbb538bdb4757335b},
affiliation={Department of Animal Sciences, University of Illinois, Urbana, IL 61801, United States; Institute for Genomics and Systems Biology, Argonne National Laboratory, Argonne, IL 60439, United States; Department of Biology, San Diego State University, San Diego, CA 92813, United States; School of Biological Sciences, Flinders University, Adelaide, SA 5001, Australia; Center for Microbial Sciences, San Diego State University, San Diego, CA 92813, United States; Department of Computer Sciences, San Diego State University, San Diego, CA 92813, United States; Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, United States; J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, MD 20850, United States; Architecture et Fonction des Macromolecules Biologiques, Unité Mixte de Recherche 6098, Universites Aix-Marseille I and II, Case 932, 163 Avenue de Luminy, 13288 Marseille, France; Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, United States},
abstract={The complex microbiome of the rumen functions as an effective system for the conversion of plant cell wall biomass to microbial protein, short chain fatty acids, and gases. As such, it provides a unique genetic resource for plant cell wall degrading microbial enzymes that could be used in the production of biofuels. The rumen and gastrointestinal tract harbor a dense and complex microbiome. To gain a greater understanding of the ecology and metabolic potential of this microbiome, we used comparative metagenomics (phylotype analysis and SEED subsystems-based annotations) to examine randomly sampled pyrosequence data from 3 fiber-adherent microbiomes and 1 pooled liquid sample (a mixture of the liquid microbiome fractions from the same bovine rumens). Even though the 3 animals were fed the same diet, the community structure, predicted phylotype, and metabolic potentials in the rumen were markedly different with respect to nutrient utilization. A comparison of the glycoside hydrolase and cellulosome functional genes revealed that in the rumen microbiome, initial colonization of fiber appears to be by organisms possessing enzymes that attack the easily available side chains of complex plant polysaccharides and not the more recalcitrant main chains, especially cellulose. Furthermore, when compared with the termite hindgut microbiome, there are fundamental differences in the glycoside hydrolase content that appear to be diet driven for either the bovine rumen (forages and legumes) or the termite hindgut (wood). © 2009 by The National Academy of Sciences of the USA.},
author_keywords={CAZymes;  Cellulases;  Plant cell wall;  Pyrosequencing},
Isoptera},
document_type={Article},
source={Scopus},
}"
fil <- tempfile("data")
write(bib_string, fil)
bib2df::bib2df(fil)
#> Column `YEAR` contains character strings.
#>               No coercion to numeric applied.
#> # A tibble: 1 x 37
#>   CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF
#>   <chr>    <chr>     <chr>   <chr>  <list> <chr>     <chr>   <chr>   
#> 1 ARTICLE  Brulc200~ <NA>    <NA>   <chr ~ <NA>      <NA>    <NA>    
#> # ... with 29 more variables: EDITION <chr>, EDITOR <list>,
#> #   HOWPUBLISHED <chr>, INSTITUTION <chr>, JOURNAL <chr>, KEY <chr>,
#> #   MONTH <chr>, NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>,
#> #   PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>,
#> #   TYPE <chr>, VOLUME <chr>, YEAR <chr>, AUTHOR..BRULC. <chr>,
#> #   TITLE..GENE.CENTRIC <chr>, JOURNAL..PROCEEDINGS <chr>,
#> #   YEAR..2009.. <chr>, DOI..10.1073.PNAS.0806191105.. <chr>,
#> #   URL..HTTPS...WWW.SCOPUS.COM.INWARD.RECORD.URI.EID.2.S2.0.60549114321.DOI.10.1073.2FPNAS.0806191105.PARTNERID.40.MD5.8D70A27545328D4CBB538BDB4757335B.. <chr>,
#> #   AFFILIATION..DEPARTMENT <chr>, ABSTRACT..THE <chr>,
#> #   AUTHOR_KEYWORDS..CAZYMES. <chr>, DOCUMENT_TYPE..ARTICLE.. <chr>,
#> #   SOURCE..SCOPUS.. <chr>

Created on 2019-08-06 by the reprex package (v0.3.0)

ottlngr commented 5 years ago

On hold. Probably fixed by #34

ottlngr commented 4 years ago

Hi @benjaminschwetz ,

sorry for the delay. Yesterday I merged your changes into master. The test cases you added run successfully, so I'm going to close this issue.