mjwestgate / revtools

Tools to support research synthesis in R
https://revtools.net
48 stars 26 forks source link

Medline abstracts not read in correctly #30

Open ellereve opened 4 years ago

ellereve commented 4 years ago

When reading in a RIS txt file, abstracts for MEDLINE are not always properly read in. Instead of all information in the single abstract column, several columns are created for each abstract subsection (for example, when a journal divides its abstract into the explicit sections Background, Objectives, Methods, Results, Conclusions, etc.) The abstract column then only contains the information in the first subsection (e.g., Background) and separate columns are generated for each proceeding subsection (e.g., CONC, where the non-na column contents always start with LUSIONS followed by the conclusions text from the abstract).

ellereve commented 4 years ago

It seems like quite a task to try and fix that since the abstract subsection words aren't extremely regular. Maybe just a note in the function documentation about the issue would be helpful to others. I've worked around the chopped-up abstract problem by converting my RIS to BIB in Zotero, reading both the cis and bib into R, and merging the columns I feel are correct into a single data frame (not so elegant..)

mtnbikerjoshua commented 2 years ago

Hi Kelly, I'm having trouble reproducing this issue. I tried reading a MEDLINE file like this one: pubmed-21061207.txt and had no issues. Could you upload an example file?