petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

annotate articles with Hypothes.is #36

Open petermr opened 4 years ago

petermr commented 4 years ago

Hypothes.is allows both manual and machine annotation of articles. ContentMine used this 3 years ago with help from the H.is team.

 Current tasks.

Identify terms in the oil186/ corpus corresponding to the following tags:

location

this is the source of material . We have a dictionary of country which can be used

preparation

how the plant material was harvested and processed. Includes "dried" "macerated", etc.

extraction

method of extraction (@mannyrules is creating a small dictionary).

instrument

equipment (GC MS, etc.)

activity

activity actually tested in article

organisms

target organisms (maybe dictionary)

EmanuelFaria commented 4 years ago

I’m tied up this weekend, but will begin on Monday October 14

Sent with GitHawk

EmanuelFaria commented 4 years ago

I just created a Hypothes.is group for CEVopen called "Oil186 Annotations" please join it here: https://hypothes.is/groups/vN4d6wq9/cevopen

I'll start annotating from the bottom of the list in the Oil186 directory (PMC6015887) and work my way up.

@petermr please let me know if and how you wish me to "mark" my progress in each directory, or just update by leaving messages in this Issue. (#36).

EmanuelFaria commented 4 years ago

P.S. to view the html files in any of the directories within a browser, just preappend the following before the URL : https://htmlpreview.github.io/?

Otherwise, you can go to https://htmlpreview.github.io/ and just paste in your URL to accomplish the same thing.

EmanuelFaria commented 4 years ago

Update: So far, I have not been able to post to the group I created, even though it is listed in my Hypothesis.is account. I'll keep going with Public for now until I can figure it out, and then move them into the group if necessary.

petermr commented 4 years ago

On Mon, Oct 14, 2019 at 8:12 PM Emanuel Faria notifications@github.com wrote:

I just created a Hypothes.is group for CEVopen called "Oil186 Annotations" please join it here: https://hypothes.is/groups/vN4d6wq9/cevopen

Done.

I'll start annotating from the bottom of the list in the Oil186 directory (PMC6015887) and work my way up.

OK

@petermr https://github.com/petermr please let me know if and how you wish me to "mark" my progress in each directory, or just update by leaving messages in this Issue. (#36 https://github.com/petermr/CEVOpen/issues/36).

I think just update in the issue. There will unexpected problems and Issues is where to discuss them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/36?email_source=notifications&email_token=AAFTCSZOWEVIQVQJL3SM7G3QOTACBA5CNFSM4I74EIIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGDNBI#issuecomment-541865605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3XQL34HMESIWK5PLLQOTACBANCNFSM4I74EIIA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

EmanuelFaria commented 4 years ago

I figured out the Group issue. There's a pulldown menu at the top of the Hypothes.is menu bar at the right side of the page. See screenshot on how to select it: https://www.dropbox.com/s/3w9ba38ua7k2ntv/Screenshot%202019-10-14%2017.20.55.png?dl=0

EmanuelFaria commented 4 years ago

@petermr FYI: I'm annotating using the words you noted above (Location, Preparation, etc., ) and also using the same words as tags, in case it helps you quickly show and select just tags for creating the dictionary. If that doesn't help, let me know please.

petermr commented 4 years ago

We now have 7 dictionaries activity https://github.com/petermr/CEVOpen/tree/master/dictionary/activity

compound https://github.com/petermr/CEVOpen/tree/master/dictionary/compound

instrument/raw https://github.com/petermr/CEVOpen/tree/master/dictionary/instrument/raw

plant https://github.com/petermr/CEVOpen/tree/master/dictionary/plant

plantparts/raw https://github.com/petermr/CEVOpen/tree/master/dictionary/plantparts/raw

process https://github.com/petermr/CEVOpen/tree/master/dictionary/process

targetOrganism https://github.com/petermr/CEVOpen/tree/master/dictionary/targetOrganism

You could use all these if they occur. Have a look at the contents of each to get a feel.

On Mon, Oct 14, 2019 at 9:33 PM Emanuel Faria notifications@github.com wrote:

@petermr https://github.com/petermr FYI: I'm annotating using the words you noted above (Location, Preparation, etc., ) and also using the same words as tags, in case it helps you quickly show and select just tags for creating the dictionary. If that doesn't help, let me know please.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/36?email_source=notifications&email_token=AAFTCSYJQTOYWCQ3XWDADILQOTJRVA5CNFSM4I74EIIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBGNDBY#issuecomment-541905287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7TMKVVCDVBM6WBBO3QOTJRVANCNFSM4I74EIIA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

EmanuelFaria commented 4 years ago

Thanks @petermr

It took me longer than expected to go through https://htmlpreview.github.io/?https://github.com/petermr/CEVOpen/blob/master/searches/oil186/PMC6015887/scholarly.html because it was longer and much different than the ones you and I previewed together on screen share.

The question I have for my next articles is:

NEW HEADINGS?

I found some more potentially useful headings. Please let me know if you want me to continue tagging these, and if so, please add them to your original list at the top of the page.

Organism-Part Parts of the organism mentioned in the article that indicate their presence and also the site of the activity Plasma Membrane (PM)  Cell Wall (CW) Cell Membrane (CM) Cell Cytomplasm cytoplasmic organelles Hyphaei plasmatic membrane

Organism-Compound present in Compounds present in the pathogenic organisms that indicate their presence

Activity-Use Specific disease mentioned in article

Activity-Site of Action eg. Fruit Human skin Particular Organ

Activity Mechanism eg. Lysis

Active Compounds Particular compounds mentioned in the article as being important to the activity

Equipment-Measurement Tool eg:  transmission electron microscopy (TEM) 

Measurement-Concentration Of EO (CEO)

MEC Minimum Effective Concentration

MFC
 Minimal Fungicidal Concentration (MFC) 

MIC Minimal Inhibitory Concentration (MIC)

Results

Disease-Plant Disease that affects plant: Specific disease mentioned in article/exeriment

Disease-Fruit Disease that affects fruit: Specific disease mentioned in article/exeriment

Plant-EO Name of the Plant and or Essential Oil mentioned

Plant Family Sometimes the plant family is mentioned specifically

Method of EO Contact eg. Direct contact or gaseous

In Vivo

In Vitro

XML error Tagged anomalies in the appearance of the text shown in html

petermr commented 4 years ago

On Mon, Oct 14, 2019 at 11:45 PM Emanuel Faria notifications@github.com wrote:

Thanks @petermr https://github.com/petermr

It took me longer than expected to go through https://htmlpreview.github.io/?https://github.com/petermr/CEVOpen/blob/master/searches/oil186/PMC6015887/scholarly.html because it was longer and much different than the ones you and I previewed together on screen share.

Yes - I found that with the 'eczema` articles as well. Things will be a bit better when I have the sectioning in place.

The question I have for my next articles is:

  • Do you need me to annotate/tag things only once per article, once per section in the article... or just once in total ... meaning if I tagged the name of an organism in one article, do you want it tagged again in any other it appears in?

The goal, of course, it to have the machine do this. The manual tagging is to create a "gold standard" .

I think we want a SMALL number of tagged articles to act as a reference - to see if the software finds all the targets. There are often 20-50 mentions of (say) species in a paper - others only have one - and we use this number to emphasize the relative importance.

NEW HEADINGS?

I found some more potentially useful headings. Please let me know if you want me to continue tagging these, and if so, please add them to your original list at the top of the page.

Organism-Part Parts of the organism mentioned in the article that indicate their presence and also the site of the activity Plasma Membrane (PM) Cell Wall (CW) Cell Membrane (CM) Cell Cytomplasm cytoplasmic organelles Hyphaei plasmatic membrane

I don't think these are useful for EOs at present. Our approach is a subset of info:

  • what plant?
  • where from?
  • what part?
  • how treated?
  • what compounds found?
  • what activity found?

That's a lot! More than anyone else. It's aimed at:

There is also some basic science:

That's the simple big picture.

All the following are "second-order" - i.e. they might occur in a few percent only and are not very standardised. The LONG TAIL. We should not chase the long tail

Organism-Compound present in Compounds present in the pathogenic organisms that indicate their presence

Diagnostic? No

Activity-Use Specific disease mentioned in article

You mean not in our activity dictionary? Add it.

Activity-Site of Action eg. Fruit Human skin Particular Organ

No

Activity Mechanism eg. Lysis

Probably not

Active Compounds Particular compounds mentioned in the article as being important to the activity

Yes, but only if there is justification and it's easy to extract

Equipment-Measurement Tool eg: transmission electron microscopy (TEM)

in "instrument"

Measurement-Concentration Of EO (CEO)

MEC Minimum Effective Concentration

MFC Minimal Fungicidal Concentration (MFC)

MIC Minimal Inhibitory Concentration (MIC)

Yes. But we have to extract the data as well.

Results

Disease-Plant Disease that affects plant: Specific disease mentioned in article/exeriment

lower priority

Disease-Fruit Disease that affects fruit: Specific disease mentioned in article/exeriment

lower priority

Plant-EO Name of the Plant and or Essential Oil mentioned

You mean the trade or generic name? "oil of cloves" - low priority

Plant Family Sometimes the plant family is mentioned specifically

algorithmic from Wikidata

Method of EO Contact eg. Direct contact or gaseous

no

In Vivo

low priority

In Vitro

low priority

XML error Tagged anomalies in the appearance of the text shown in html

???

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/36?email_source=notifications&email_token=AAFTCS6XHRMA72ERUBJSQF3QOTZAHA5CNFSM4I74EIIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBG3HOA#issuecomment-541963192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS2ZZT64HABKH5ZMOFDQOTZAHANCNFSM4I74EIIA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK