petermr commented 5 years ago

SimonW asked: Is it possible you could contribute a small post this week to GenR on the 'OA Climate Change' question. Just as a suggestion, part based on last week's #eLifeSprint - it would be great if people could know what 'Content Mine' software is, how it can be used in article analysis for climate change, and essentially what the next step of mining CrossRef would involve and the kind of stats, types of information, data that could result. The reason I say this is that then I can use this to talk with various tech teams, librarians etc., and try and muster some volunteer help.

Greetings to GenR!

I'm Peter Murray-Rust, a retired chemistry academic from Cambridge University, and I feel that the most important thing in our lives now is climate change.

But what can I do that's most effective for me and the world?

However we solve climate change one thing seems certain - we need global collaboration based on facts. Emotions will keep us going, but facts will decide what we do?

I don't know all the facts that I should. If I lectured to a first year university course I couldn't give an accurate picture of the facts and what actions they dictate. So I'm going to try to learn what is common knowledge. But my special contribution comes from a technology and philosophy that allows us to get huge numbers of facts from reliable sources - the scientific literature.

There are literally hundreds of thousands of articles (papers) that are about CC to some degree. This is searching the biomedical literature (explanation later, and you'll find it's very simple):

getpapers -q "climate change" -n -a
info: Searching using eupmc API
info: Running in no-execute mode, so nothing will be downloaded
info: Found 135931 results

This uses Rik Smith-Unna's getpapers to search EuropePubMedCentral for papers with "climate change" in the text. About half of them (65516) are "open access" and can be downloaded (but this figure is likely to be lower for non-biomedical). They are about everything:

species extinction
sea level rise
spread of parasite vectors
weather changes
engineering responses
response by society

and they're about everywhere on the planet.

So if you want to find out about crops and West Africa...

getpapers -q "((climate change) AND (west africa) AND (crops))" -n -a
info: Found 1628 results

That's a lot of papers ! But if you have enough disk space and a reasonably good connection you can download them in 5 minutes.

Are they useful? That's where our AMI comes in. AMI searches these papers on your disk within a minute or two for things you might be interested in:

species
vectors
tropical diseases
chemicals
countries
funders
international organizations and lots more

The great thing is that anyone who can run a program can do this! Lars, in the Netherlands, 15 years old, learn how to do this and developed more software. If you love computers (and have one), or data, or tackling scientific problems or combatting CC that's all you need.

This makes a great citizen science project. Anyone anywhere with a Net connection can do it. The software, data and dictionaries ar all Open (no restrictions on use, no fee, and you can change them without permission). We'll share the data we find (probably on Github) as soon as we capture it. This is "OpenNoteBook science", (no insider knowledge) promoted by Jean_Claude Bradley.

Don't think that because you aren't a "scientist" you can't understand scientific papers. Not all of them (I can't either) or some parts of them, but there are many you can understand the key bits of. If you like maps, graphs, and similar data then you'll feel right at home.

We've set up a project here, on Github. The technology is used in several projects (most notably plants and their medicinal products) so that means that bugs get reported and hopefully fixed. No matter what your interests and skills you're welcome.

There's a lot of useful stuff on the sister project, essential oils. Also the data we extract is open and well organized so we can use a wide range of other software to analyse it.

If you are a techie, there's a tutorial (rather XML-heavy!) I'm giving next week at XMLSummerSchool (Oxford) at https://github.com/petermr/CEVOpen/blob/master/docs/2019_raw_petermr.potx - you'll have to download it. We're also starting a communal article for Beilsten J. Organic Chem at https://github.com/petermr/CEVOpen/blob/master/BJOC

mrchristian commented 5 years ago

I need to check a few things about the software mentioned above:

I want to make sure I'm getting the correct address for the software, installation manuals, and use manuals.

getpapers - Is this the software repository and install instructions used https://github.com/contentmine/getpapers and is this the tutorial for use you would recommend https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md
AMI - software and installation guide - https://github.com/ContentMine/ami - the tutorial for using AMI https://github.com/petermr/tigr2ess/blob/master/search/TUTORIAL.md

Thanks

Simon

petermr commented 5 years ago

Yes, although I am creating a new repo ami3 instead of normami I want to extract the key components out of TIGR2ESS into openNoteBook

On Wed, Sep 11, 2019 at 2:18 PM Simon Worthington notifications@github.com wrote:

I need to check a few things about the software mentioned above:

I want to make sure I'm getting the correct address for the software, installation manuals, and use manuals.

getpapers - Is this the software repository and install instructions used https://github.com/contentmine/getpapers and is this the tutorial for use you would recommend https://github.com/petermr/tigr2ess/blob/master/getpapers/TUTORIAL.md

AMI - software and installation guide - https://github.com/ContentMine/ami - the tutorial for using AMI https://github.com/petermr/tigr2ess/blob/master/search/TUTORIAL.md

Thanks

Simon

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/14?email_source=notifications&email_token=AAFTCS6UOJA2NR4RKP56SFLQJDV2DA5CNFSM4IU4XBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OOLFQ#issuecomment-530376086, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSZAXCC3ZXURFNIW3ILQJDV2DANCNFSM4IU4XBAA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 5 years ago

That's great, I can go with these addresses for the moment. Thank you.

Another question: Why use getpapers + AMI and not just search on say europepmc https://europepmc.org/search?query=climate%20change

I can obviously string together a bunch of reasons groups ed around 'data science', 'having a collection, 'putting to use in another context' e.g., literature, media for a class, a citizen science project, etc.: with actions like, download all the papers, keep the papers, carry out further searches as a when you want, using the content in another context for your community - in a project with a groups There is also the post processing that is not be touched on yet.

But it's better you give me the 'OK' that I'm on the right track, or you have another or a complementary vision.

mrchristian commented 5 years ago

NB: I will make three info boxes:

eLifeSprint
Content Mine software: getpapers + AMI
Open Climate Knowledge projects - which combines 'openNotebook' with mission to build actionable plan for '100% OA Climate Change'.

I'll pass them by you when done and move the whole package to a collaborative doc to finish it off.

petermr commented 5 years ago

On Wed, Sep 11, 2019 at 2:55 PM Simon Worthington notifications@github.com wrote:

That's great, I can go with these addresses for the moment. Thank you.

Another question: Why use getpapers + AMI and not just search on say europepmc https://europepmc.org/search?query=climate%20change

It's a wrapper but a nice substantial wrapper. It builds the directory structure (CProject) and also allows cutoffs and download of PDFs. It can be done with "curl" but you have to work out the cursor and write a script.

I can obviously string together a bunch of reasons groups ed around 'data science', 'having a collection, 'putting to use in another context' e.g., literature, media for a class, a citizen science project, etc.: with actions like, download all the papers, keep the papers, carry out further searches as a when you want, using the content in another context for your community - in a project with a groups There is also the post processing that is not be touched on yet.

But it's better you give me the 'OK' that I'm on the right track, or you have another or a complementary vision.

You're on the right track completely. The main problem is that libraries, including slightly EPMC , all think people want to download a few papers and read them. There is little support for automatic mining. And there was even less in 2015 when getpapers was written. I expect Getpapers to be overtaken but it hasnt happened yet.

P.

—

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/14?email_source=notifications&email_token=AAFTCS4BYRZ7L4K3J3CT5H3QJD2E5A5CNFSM4IU4XBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OSFVI#issuecomment-530391765, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS3YHQS6KJO6SP5SFF3QJD2E5ANCNFSM4IU4XBAA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 5 years ago

Great thank you. I'll add a line or two to the article to point out what its more than search.

Info: block 1. about the eLife Sprint

InstruMinetal team: eLifeSprint 2019

Project: SaWaMine (working title)

#eLifeSprint2019 4–5 September 2019, Cambridge UK and online.

The ContentMine software was used by a sprint groups of seven (Sabine Weber, Michael Owonibi, Tiago Lubiana, Peter Murray-Rust, Sophia K. Cheng, Wambui Karuga, and Leonie Mueck) to protoytpe a UI for users to identifying scientific instuments from canidate search results made using ContenMine's software getpapers and AMI extracted from a corpus of papers about phytochemistry called CEVOpen.

Goals of the Project:

Create a way of automatically extracting candidates for scientific equipment terms from scientific papers.
Create a GUI to display the paper's paragraph from the candidates containing scientific equipment, allowing user to select the ones that are actually instruments.
Find out what kind of scientific equipment the papers in the CEVOpen corpus used and add the terms to Wikidata.
Long term goal: Connect tool and the UI.

NB: Content is partly based on https://github.com/caffiendFrog/elife2019 from Sophia Cheng @caffiendFrog

mrchristian commented 5 years ago

Please edit the above infobox here on this pad, will be easier https://cryptpad.fr/code/#/2/code/edit/2nIMow-uTuQpNJv2RzZwdark/

petermr commented 5 years ago

Tomorrow...

On Wed, Sep 11, 2019 at 4:32 PM Simon Worthington notifications@github.com wrote:

Please edit the above infobox here on this pad, will be easier https://cryptpad.fr/code/#/2/code/edit/2nIMow-uTuQpNJv2RzZwdark/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/14?email_source=notifications&email_token=AAFTCS2AKVFDTN3K44OCMITQJEFR7A5CNFSM4IU4XBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6O45BQ#issuecomment-530435718, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSZ3BG6FTBKLPZHIRW3QJEFR7ANCNFSM4IU4XBAA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 5 years ago

I've turned the 1. eLifeSprint 'infobox' around, my version yesterday was mixed up and a bit rubbish. I think i was running on empty. Much better now.

I will complete the whole blog piece edit and add other infoboxes in the same doc, a bit more sane.

I'm just trying to resolve the order of things. I think that my infobox 2. Content Mine software: getpapers + AMI, should really be called 'openNotebook' https://github.com/petermr/openNotebook | Am I getting this right openNotebook is the wrapper, vehicle, to put forward the toolset and method?

I'll get finished up in content here https://cryptpad.fr/code/#/2/code/edit/2nIMow-uTuQpNJv2RzZwdark/

mrchristian commented 5 years ago

Article is in reasonable shape now. It has an intro and three infoboxes for the end of the article. I need to give it another working over and gather together some images. I'll do this in an hours time, first I have to have a meeting with a colleague.

https://cryptpad.fr/code/#/2/code/edit/2nIMow-uTuQpNJv2RzZwdark/

I will finish up the article today, recheck in the morning and post before 12 noon CEST Friday. Keep the momentum going :-)

mrchristian commented 5 years ago

OK, I have the article ready to publish. There is one term I'm not sure if I'm getting it right 'species distribution and migration', its one of the subjects we said we'd cover in the openNotebook OA searches. I think I'm mixing up the term?

I will shortly move the doc to Wordpress, but not before I gather pics and I'll make a note here when it moves.

I can see that there is a need to explain a lot more about openNotebook, how it works, what you get out of it: stats, files, data? Good we're starting a new blog for it, there will be plenty to do :-)

petermr commented 5 years ago

That's fine

On Fri, Sep 13, 2019 at 10:53 AM Simon Worthington notifications@github.com wrote:

OK, I have the article ready to publish. There is one term I'm not sure if I'm getting it right 'species distribution and migration', its one of the subjects we said we'd cover in the openNotebook OA searches. I think I'm mixing up the term?

I will shortly move the doc to Wordpress, but not before I gather pics and I'll make a note here when it moves.

I can see that there is a need to explain a lot more about openNotebook, how it works, what you get out of it: stats, files, data? Good we're starting a new blog for it, there will be plenty to do :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/14?email_source=notifications&email_token=AAFTCS5PDHRPI6NXYAVEPWDQJNPLNA5CNFSM4IU4XBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6URYXA#issuecomment-531176540, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS2VRXLQWVKPFQDMI3TQJNPLNANCNFSM4IU4XBAA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 5 years ago

OK blogpost published, fiddle, fiddle, fiddle https://genr.eu/wp/open-climate-knowledge-100-oa-for-climate-change/ any mistakes please drop me a line - phew

petermr commented 5 years ago

Many thanks - just picked it up. Very good. You've managed to get a lot from the eLife pages.

More later.

On Fri, Sep 13, 2019 at 12:57 PM Simon Worthington notifications@github.com wrote:

OK blogpost published, fiddle, fiddle, fiddle https://genr.eu/wp/open-climate-knowledge-100-oa-for-climate-change/ any mistakes please drop me a line - phew

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/14?email_source=notifications&email_token=AAFTCS2VT46EXXD2TYUCPLTQJN52PA5CNFSM4IU4XBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UZU3A#issuecomment-531208812, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS4X6XNQ44FXHE2Y2YLQJN52PANCNFSM4IU4XBAA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr / climate

write GenR blog post for Simon Worthington #14

InstruMinetal team: eLifeSprint 2019