nasa-petal / PeTaL-labeller

The PeTaL labeler labels journal articles with biomimicry functions.
https://petal-labeller.readthedocs.io/en/latest/
The Unlicense
6 stars 3 forks source link

Use open-source Snorkel to create labelling functions to expand our training dataset. #65

Open bruffridge opened 3 years ago

bruffridge commented 3 years ago

https://www.snorkel.org/get-started/

https://github.com/snorkel-team/snorkel

Many of our labelling functions will use MAG topics. For this we will use the free MAG APIs 'evaluate' method. I will provide an API key for this.

‘Evaluate’ method ’try it out' https://msr-apis.portal.azure-api.net/docs/services/academic-search-api/operations/565d753be597ed16ac3ffc03? API limits 10,000 transactions per month, 3 per second for interpret, 1 per second for evaluate, 6 per minute for calcHistogram.

API Documentation https://docs.microsoft.com/en-us/academic-services/project-academic-knowledge/reference-query-expression-syntax https://docs.microsoft.com/en-us/academic-services/project-academic-knowledge/reference-evaluate-method

List of Microsoft Academic topics https://academic.microsoft.com/topics/100858432?fullPath=false

Example API request to get ids, dois, titles, abstract, topics, authors, venue, and references labelled with 'biology' OR in the Biomimetics journal AND labelled with 'wind stress' OR 'wind engineering'. This would be used for a labelling function for 'protect from wind'.

https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?expr=And(Or(Composite(F.FN=='wind stress'),Composite(F.FN=='wind engineering')),Or(Composite(J.JN=='biomimetics'), Composite(F.FN=='biology')))&model=latest&count=10&offset=0&attributes=Id,DOI,Ti,VFN,F.FN,AA.AuId,AW,RId
bruffridge commented 3 years ago

Labelling functions for labels with < 10 examples

Sense temperature cues: ('biology' OR journal:'biomimetics') AND ('temperature measurement' OR 'temperature monitoring' OR 'temperature sensing') Send light signals in the non-visible spectrum: ('biology' OR journal:'biomimetics') AND 'ultraviolet' Capture energy: ('biology' OR journal:'biomimetics') AND 'energy harvesting' Protect from wind: ('biology' OR journal:'biomimetics') AND ('wind stress' OR 'wind engineering') Detox/purify: ('biology' OR journal:'biomimetics') AND 'detoxification' Break down structure: ('biology' OR journal:'biomimetics') AND ('breakup' OR 'breakage') Modify/convert electrical energy: ('biology' OR journal:'biomimetics') AND ('electrochemical energy conversion' OR 'thermoelectric energy conversion') Absorb and/or filter gases: ('biology' OR journal:'biomimetics') AND ('air filter' OR 'air filtration') Camouflage/mimicry: ('biology' OR journal:'biomimetics') AND ('camouflage' OR 'mimicry') Differentiate signal from noise: ('biology' OR journal:'biomimetics') AND ('signal differentiation' OR 'noise (signal processing)') Prevent fatigue: ('biology' OR journal:'biomimetics') AND ('fatigue resistance' OR 'frictional resistance') Protect from radiation: ('biology' OR journal:'biomimetics') AND ('radiation protection' OR 'radiation resistance' OR 'radiation tolerance' OR 'radiation resistant' OR 'radiation-protective agents' OR 'radiation inactivation') Absorb and/or filter liquids: ('biology' OR journal:'biomimetics') AND ('absorbance' OR 'filtration') AND ('water content' OR 'water treatment' OR 'water quality' OR 'water pollution' OR 'water supply' OR 'seawater' OR 'wastewater' OR 'groundwater' OR 'tap water' OR 'raw water' OR 'soil water' OR 'river water' OR 'fresh water' OR 'surface water' OR 'brackish water' OR 'distilled water' OR 'portable water purification' OR 'coagulation (water treatment)' OR 'sedimentation (water treatment)' OR 'water column' OR 'liquid medium') Protect from gases: ('biology' OR journal:'biomimetics') AND () Send electrical/magnetic signals: 'biology' AND () Sense motion: 'biology' AND () Chemically break down inorganic compounds: 'biology' AND () Expel gases: 'biology' AND () Send sound signals: 'biology' AND () Sense spatial awareness/balance/orientation: 'biology' AND () Manage drag/turbulence: 'biology' AND () Protect from fire: 'biology' AND () Store gases: 'biology' AND () Absorb and/or filter solids: 'biology' AND () Modify/convert magnetic energy: 'biology' AND () Send vibratory signals: 'biology' AND () Chemically assemble inorganic compounds: 'biology' AND () Compete within or between species: 'biology' AND () Cooperate within or between species: 'biology' AND () Manage environmental disturbance in a community: 'biology' AND () Self-replicate: 'biology' AND () Send chemical signals: 'biology' AND () Send tactile signals: 'biology' AND () Sense chemicals: 'biology' AND () Sense disease: 'biology' AND () Store solids: 'biology' AND () Not biomimicry: ?

bruffridge commented 3 years ago

Another idea for a labelling function is to take our ground truth labelled papers, and use the MAG API to find papers related to each one. Each related paper gets assigned the same label as the ground truth paper it's related to.

Here's how to get related papers from MAG (kudos to @dsmith111 for figuring this out!):

Passing the paper's Id into this link: https://academic.microsoft.com/api/entity/***?entityType=2 (Where *** is the Id) dumps all of the related paper Ids in a nice JSON format.

Example link: https://academic.microsoft.com/api/entity/3012421327?entityType=2