wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

Mozilla Open Science sprint #143

Open Daniel-Mietchen opened 9 years ago

Daniel-Mietchen commented 9 years ago

We are taking part in the Mozilla Open Science sprint (overview) and welcome contributions to any of the software projects here at WikiProject Open Access, in particular to the YouTube exporter (#82) and the Open Access signalling project.

If you are interested in getting involved, please leave a note here, and we will take things from there.

Daniel-Mietchen commented 9 years ago

Here is what I plan to do: get an overview of all the <license> statements within the Open subset of the articles on PubMed Central.

I will update this comment as I move forward.

Day 1

Day 2

Klortho commented 9 years ago

grep -ohPR --include="*.nxml" "<license(.*)</license>" .

If an article has more than one <license> element, this captures everything between the two. Use the non-greedy matcher, instead:

grep -ohPR --include="*.nxml" "<license(.*?)</license>" .

Daniel-Mietchen commented 9 years ago

Cool, thanks!