rdmpage / biostor

Open access articles extracted from the Biodiversity Heritage Library
http://biostor.org
5 stars 2 forks source link

Extracting references cited lists whole article #69

Open rdmpage opened 7 years ago

rdmpage commented 7 years ago

If an article doesn't start on a new page but part way down a page, and the reference list of the previous article appears on the same page, then the entire text of the article can be extracted! Need to add test for where literature cited tag appears in text (e.g., ignore on page 1 if article has > 1 page). For example, see http://biostor.org/reference/228351

53606138-normal