Open EmanuelFaria opened 4 years ago
ami-search
tested on commandline and will be deployed here on oil1000
finalised chemistry
on E1.0 with @ambarishk . May need false positives removing
[No action on oil1000
]
Welcomed @egonw and @larsgw
Sir, volunteering the event will be happy moment for me. You may tell me any thing to perform at any time. As you feel convenient.
I just got approval to license terminology data from the U.S. National Library of Medicine (NLM).
The UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including electronic health records.
The VSAC is a repository and authoring tool for public value sets created by external programs. Value sets are lists of codes and corresponding terms, from NLM-hosted standard clinical vocabularies (such as SNOMED CT, RxNorm, LOINC and others), that define clinical concepts to support effective and interoperable health information exchange.
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary.
U.S. Edition of SNOMED CT is one of a suite of designated standards for use in U.S. Federal Government systems for the electronic exchange of clinical health information and is also a required standard in interoperability specifications of the U.S. Healthcare Information Technology Standards Panel. The clinical terminology is owned and maintained by SNOMED International, a not-for-profit association.
The NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Your UTS license allows you to:
ami-search
so in oldstyle (per-project) modeI think they work reasonably well , but the output is ugly. Needs a display (probably HTML).
Sent Peter some DRAFT text to review for VC's info for journal article, as well as possible press release.
Added some more possible schema and scraping tool resources to that issue.
Completed a draft of the Activity Table and sent to peter for preliminary discussion
Created a tutorial for XML Summer School based on CEVOpen. added it at https://github.com/petermr/CEVOpen/blob/master/docs/2019_raw_petermr.potx (need downloading).
Gives an account of the technical steps in running download and search.
Sir,
I have test-run (ami3) ami-search over CProject - oil186.
narrative of slides of XML summer school is really very good and points-out the need, importance and potential of TDM in current research scenario.
How to get word frequencies as of over slide number 10?
On Wed, Sep 11, 2019 at 8:23 AM Ambarish Kumar notifications@github.com wrote:
20190911
Sir,
-
I have test-run (ami3) ami-search over CProject - oil186.
excellent. Can you also run ami-section -p oil186 --sections ALL this will extract the main sections from the papers. Then we will need a search engine for sections - I will write it.
-
narrative of slides of XML summer school is really very good and points-out the need, importance and potential of TDM in current research scenario.
How to get word frequencies as of over slide number 10?
I think it's 15 in my deck - you may have an early one. I think it comes out automatically for each CTree - in search/words, but not for the aggregated summaries underneath the CProject. This needs debugging and you will be able to help do that.
P.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6?email_source=notifications&email_token=AAFTCS3TLNDA3GGJFIUJJE3QJCMFTA5CNFSM4IRS2RB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NRFZY#issuecomment-530256615, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS4N27LHZZ5QOAVJG6LQJCMFTANCNFSM4IRS2RBQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
$ mvn install -Dmaven.test.skip=true
[INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:10 min [INFO] Finished at: 2019-09-12T11:34:29+05:30 [INFO] ------------------------------------------------------------------------
Locate `target` sub-directory within `ami3`.
[cbl@localhost ami3]$ ls AMI-STEM.md CONTRIBUTING.md HELP.md LICENSE PROBLEMS.md src BUILDING.md EXAMPLES.md INSTALL.md pom.xml README.md target
- set environment path variable to access ami tools.
[cbl@localhost bin]$ pwd /home/cbl/CEVOpen/ami3/target/appassembler/bin
[cbl@localhost bin]$ export PATH=$PATH:/home/cbl/CEVOpen/ami3/target/appassembler/bin
- Running `ami-section` over `CProject - oil186`
[cbl@localhost CEVOpen]$ ami-section -p oil186 --sections ALL
-v to see generic values oldstyle true
sectionList [ABBREVIATION, ABSTRACT, ACK_FUND, APPENDIX, ARTICLE_META, ARTICLE_TITLE, CONTRIB, AUTH_CONT, BACK, BODY, CASE, CONCL, COMP_INT, DISCUSS, FINANCIAL, FIG, FRONT, INTRO, JOURNAL_META, JOURNAL_TITLE, PUBLISHER_NAME, KEYWORD, METHODS, OTHER, PMCID, REF, RESULTS, SUPPL, TABLE, SUBTITLE, TITLE] write true
AMISectionTool cTree: PMC5080681 AMISectionTool cTree: PMC5132230
- Running `ami-section` over `CProject - oil1000`.
[cbl@localhost CEVOpen]$ ami-section -p oil1000 --sections ALL
-v to see generic values oldstyle true
sectionList [ABBREVIATION, ABSTRACT, ACK_FUND, APPENDIX, ARTICLE_META, ARTICLE_TITLE, CONTRIB, AUTH_CONT, BACK, BODY, CASE, CONCL, COMP_INT, DISCUSS, FINANCIAL, FIG, FRONT, INTRO, JOURNAL_META, JOURNAL_TITLE, PUBLISHER_NAME, KEYWORD, METHODS, OTHER, PMCID, REF, RESULTS, SUPPL, TABLE, SUBTITLE, TITLE] write true
AMISectionTool cTree: PMC5080681 AMISectionTool cTree: PMC5132230
All run is over Linux - CentOS platform.
Is ami3 running satisfactorily ? If so can you give instructions on how to download the jar and run it? We have a workshop next wed and we want delegates to be able to run it Thank you
On Thu, 12 Sep 2019, 07:13 Ambarish Kumar, notifications@github.com wrote:
20190912
- build ami3
$ mvn install -Dmaven.test.skip=true
[INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:10 min [INFO] Finished at: 2019-09-12T11:34:29+05:30 [INFO] ------------------------------------------------------------------------
Locate target sub-directory within ami3.
[cbl@localhost ami3]$ ls AMI-STEM.md CONTRIBUTING.md HELP.md LICENSE PROBLEMS.md src BUILDING.md EXAMPLES.md INSTALL.md pom.xml README.md target
- set environment path variable to access ami tools.
[cbl@localhost bin]$ pwd /home/cbl/CEVOpen/ami3/target/appassembler/bin
[cbl@localhost bin]$ export PATH=$PATH:/home/cbl/CEVOpen/ami3/target/appassembler/bin
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6?email_source=notifications&email_token=AAFTCS5ENZXHDALQHPENKYLQJHMZLA5CNFSM4IRS2RB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6QY3ZI#issuecomment-530681317, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6VTUMBDCCWVBWJX53QJHMZLANCNFSM4IRS2RBQ .
Yes sir. ami3 is running satisfactorily. Sure sir. That would be my pleasure.
I will copy you into a colleague who is dockerising it.
On Thu, Sep 12, 2019 at 9:38 AM Ambarish Kumar notifications@github.com wrote:
Yes sir. ami3 is running satisfactorily. Sure sir. That would be my pleasure.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6?email_source=notifications&email_token=AAFTCS7KQTCQWDNVDVJQ4L3QJH5ZBA5CNFSM4IRS2RB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RD2RQ#issuecomment-530726214, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS4GQJV22C64XDFVNNLQJH5ZBANCNFSM4IRS2RBQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK sir.
@petermr, while I never have been a huge fan of Maven, having Java software on Maven Central (or a repository like that) is very useful: it archives the software, ensure it compiles, has clear dependencies. Is that something for AMI?
Yes, I need to set a version. until about 2 years ago there were 8 different libraries in the stack. They were modular and separable. But versioning was a nightmare. Now that I have pulled them all together I think I should start versioning them in Maven Central. But as you know it takes time...
On Thu, Sep 12, 2019 at 11:36 AM Egon Willighagen notifications@github.com wrote:
@petermr https://github.com/petermr, while I never have been a huge fan of Maven, having Java software on Maven Central (or a repository like that) is very useful: it archives the software, ensure it compiles, has clear dependencies. Is that something for AMI?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6?email_source=notifications&email_token=AAFTCSYEOU3CMEBNAUKU6PTQJILT3A5CNFSM4IRS2RB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RN6NQ#issuecomment-530767670, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6L3FAVX3CKWLFFILDQJILT3ANCNFSM4IRS2RBQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Putting on my project manager's hat ⛑, I just set up a Kanban-style project card structure here. Please bookmark it and set it as your main page.
Also, Please use the template below for each new Issue you open. Aim to complete it such that any reader can be clear about the issue's purpose and importance, and perhaps find ways we can assist you in it. Thanks!
We are building AMI so that: Type of User: __ can: ____ without:
Goals: Describe the Challenge, the solution we will bring, and the Desired End State by which all will know we have achieved excellence.
Desired Results: A clear and concise description / outline of the final "state or vision" of the project — the evidence we will see when our goals are achieved.
Guiding principles: What principles will guide our decisions as we do our part to fulfill the mission?
Massive Action Steps: What massive actions will generate the Desired Results?
Responsibilities and Roles: Who will have what completed when?
Interim Deliverable #1:
Interim Deliverable #2
Tips, Tools, Shortcuts and Resources: Anything done or used to make the desired outcome more likely to occur.
Rules and Responsibilities for Achieving Excellence Always:
Never:
No different from hangouts :-) A camera for me, a screenshare, a camera on the delegates. Maybe a chairperson.
On Fri, Sep 13, 2019 at 6:55 AM Ambarish Kumar notifications@github.com wrote:
Sir, how will you conduct workshop in broken leg?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6?email_source=notifications&email_token=AAFTCS3U7IUFB4KH6ZRNC6DQJMTNDA5CNFSM4IRS2RB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UBG4A#issuecomment-531108720, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS77MQZXDEN7UI6IHMTQJMTNDANCNFSM4IRS2RBQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
make dictionary for reported biological activity into EssoilDB1.0
run time log (truncated)
..!0 [main] DEBUG org.contentmine.ami.lookups.WikipediaLookup - URL java.io.IOException: Server returned HTTP response code: 400 for URL: https://www.wikidata.org/w/index.php?search=&search=ace-inhibitor acaricide aldose-reductase-inhibitor - antiacetylcholinesterase – antifeedant antioxidant insectifuge irritant perfumery pesticide&title=Special:Search&go=Go
!486 [main] DEBUG org.contentmine.ami.lookups.WikipediaLookup - URL java.io.IOException: Server returned HTTP response code: 400 for URL: https://www.wikidata.org/w/index.php?search=&search=ace-inhibitor acaricide aldose-reductase-inhibitor - antiacetylcholinesterase � antifeedant antioxidant insectifuge irritant perfumery pesticide&title=Special:Search&go=Go
!!...!14677
While running the script for making dictionary, many search terms has generated HTTP response code: 400 for URL
.
we should only look up single word terms. I will edit the dictionary and we'll rerun
OK sir.
Sir, Please check for new normalised activity table.
I normalised it after making one activity per row.
Total unique activity - 205.
Updation of sheet - Activitytestforspecies.tsv - for first 50 articles of oil186.
Dictionary making - TargetOrganism.xml
When browsing the content of these files, I ran into this line with what it seems to me a typo: https://github.com/petermr/CEVOpen/blob/master/dictionary/TargetOrganism.xml#L22
Yes sir. It is a typo due to misspelled term - eschrechia coli
.
Added "Manny's Activity Table RAW for Ambarish 2019-10-02.tsv" to CEVOpen/dictionary/activity/raw/. Ready for @petermr to review and deliberate before @ambarishK begins cross-referencing and normalization.
Created new Issues #42 📚DICTIONARIES to consider creating/adding
Finished organizing images of all Activity tables found in Oil186 into sub-categories by table type, activity, and measurements found in the individual tables. See issue 45 for details.
Completed two tasks:
Used that list of article IDs to create a spreadsheet with the headings as below and parsed the data so @petermr can see some of the "creative" ways the article authors displayed their data, and then find a way to normalize and extract that data:.
Table type | Table Image | paragraphs just before the table (with title, if any) | Table_Caption | Keywords_Phrases | Table_Footnote_KEY_Abbreviation s | Measurements | Measurement Unit | Method | Plant Material | Targets | Non-Plant Control Substances, Solvents, Media, Substrate | Notes | Table Type | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Eg. Antibacterial activity of Achillea millefolium L. EO against bacterial pathogens. becomes [ACTIVITY(S)] activity of [PLANT(S)] [EXTRACT(S)] against [TARGET(S)].
I just committed the finished (I hope!) dictionary: PlantMaterialHistory.xml
I have just finished uploading the cleaned, disambiguated and Wikidata attributed activities dictionary, and updated it's description, as well as the master INDEX of descriptions.
Description: A dictionary of the names of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336** with short descriptions.
Filename: activity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/activity.xml
Hallelujah.
Some updates:
Here's what this dictionary now contains:
A dictionary of 73 terms for Essential Oil extraction methods.
Filename: ExtractionMethod.xm
File Location:
https://github.com/petermr/CEVOpen/blob/master/dictionary/ExtractionMethod/ExtractionMethod.xml
DAVEid: DAVE.activity.n where n is a serialized number
term: The name is a human readable string describing the concept.
Acronym: The acronym for the term, if any.
Apparatus: Apparatus used to conduct the extraction method described by the term.
wikidataid: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
description: short description of the activity sourced from wikidata and/or wikipedia
No. of Entries (The header is not counted): 73
No. of terms describing EO Extraction Methods resolved in Wikidata: 71
No. of unique Wikidata IDs (including synonyms): 37
No. of entries with no Wikidata IDs: 2
No. of source articles with no Analysis Type found: 64
Now that we have a stand-alone dictionary for EO Extraction Methods, I have deleted the ones that were in PlantMaterialHistory.xml, renumbered the DAVEids, and updated the PlantMaterialHistoryDictionaryDescription.md as well as the master Index
Here's the updated Dictionary Entry:
A dictionary of 73 terms for Essential Oil extraction methods.
Filename: ExtractionMethod.xm
File Location:
https://github.com/petermr/CEVOpen/blob/master/dictionary/ExtractionMethod/ExtractionMethod.xml
DAVEid: DAVE.activity.n where n is a serialized number
term: The name is a human readable string describing the concept.
Acronym: The acronym for the term, if any.
Apparatus: Apparatus used to conduct the extraction method described by the term.
wikidataid: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
description: short description of the activity sourced from wikidata and/or wikipedia
No. of Entries (The header is not counted): 73
No. of terms describing EO Extraction Methods resolved in Wikidata: 71
No. of unique Wikidata IDs (including synonyms): 37
No. of entries with no Wikidata IDs: 2
No. of source articles with no Analysis Type found: 64
Plant Parts Dictionary is now complete and online
Next I'll update the results data for it's dictionary description .md file
Thanks
On Sat, 14 Mar 2020, 06:15 Emanuel Faria, notifications@github.com wrote:
Plant Parts Dictionary is now complete and online
Next I'll update the results data for it's dictionary description .md file
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6#issuecomment-598874245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4NFDMPLGHEU4BOLSDRHKA3LANCNFSM4IRS2RBQ .
I have just completed (oh god, I hope!) all of the dictionaries!
As of today, we have 11 finished dictionaries. They are:
This index contains information about the Manually Created Dictionaries for OIL186.
PLEASE NOTE: Rather than alphabetical order, are listed here in the logical progression.
The purpose/function of Dictionaries:
Identify “things” as objects or concepts (eg. “e.coli" is a concept.).
Give each object clear lexical names by which they can be searched.
(An object that goes by more than one name is a synonym.)
Give each object a link to wikidata (or other authorities) by which we can learn more about them.
Description: A dictionary of 1678 plant names extracted mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, 1567 had their names normalized and tagged with corresponding Wikidata IDs.
Filename: eoPlant.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlant/eoPlant.xml
Description: A dictionary of 285 plant part terms.
Filename: eoPlantPart.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantPart/eoPlantPart.xml
Description: A dictionary of 9568 entries for geolocations including country, countryISOcode, city, latitude, longitude, postal code and time zone sourced from http://www.ip2location.com, along with data agumenting Indian States-Cities created and maintained over the years obtained at https://network.convergenceservices.in/forum/12-joomla-development/4305-mysql-tables-for-country-states-and-indian-states-cities.html.
License information: This site or product includes IP2Location LITE data available from http://www.ip2location.com
Filename: geoLocation.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/geoLocation/geoLocation.xml
Description: A dictionary of 81 entries relating to the plant material history leading up to the extraction of Essential Oils mentioned in selected literature chosen from the 186 test articles downloaded from PubMed. The entries include key words and phrases describing: growth conditions, plant life stages, plant material selection, post-harvest treatment methods, and extracted plant material products. Of the 82 entries, 58 were resolved to WikidataIDs.
Filename: eoPlantMaterialHistory.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantMaterialHistory/eoPlantMaterialHistory.xml
Description: A dictionary of 87 terms for Essential Oil extraction methods and apparatus.
Filename: eoExtractionMethod.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoExtractionMethod/eoExtractionMethod.xml
Analytical chemistry studies and uses instruments and methods used to separate, identify, and quantify matter.[1] In practice, separation, identification or quantification may constitute the entire analysis or be combined with another method. Separation isolates analytes. Qualitative analysis identifies analytes, while quantitative analysis determines the numerical amount or concentration.
Analytical chemistry consists of classical, wet chemical methods and modern, instrumental methods.[2] Classical qualitative methods use separations such as precipitation, extraction, and distillation. Identification may be based on differences in color, odor, melting point, boiling point, radioactivity or reactivity. Classical quantitative analysis uses mass or volume changes to quantify amount. Instrumental methods may be used to separate samples using chromatography, electrophoresis or field flow fractionation. Then qualitative and quantitative analysis can be performed, often with the same instrument and may use light interaction, heat interaction, electric fields or magnetic fields. Often the same instrument can separate, identify and quantify an analyte.
(Source: https://en.wikipedia.org/wiki/Analytical_chemistry)
Description: A dictionary of 117 entries describing instruments and methods used to separate, identify, and quantify matter — 105 being resolved to wikidata IDs, and 95 with short descriptions.
Filename: eoAnalysisMethod.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoAnalysisMethod/eoAnalysisMethod.xml
Essential Oils (EOs) are the concentrated hydrophobic liquid containing volatile chemical compounds extracted from plants. Essential oils are also known as volatile oils, ethereal oils, aetherolea, or simply as the oil of the plant from which they were extracted, such as oil of clove.
Qualitative (constituent compounds) and quantitative (%) analysis of the chemical composition of the tested Essential Oils (Extracts?), with each known compound linked to its IUPAC International Chemical Identifier (InChI).
Description: A dictionary of 2114 constituent chemical compounds extracted from Essential Oils converted from essoldb1.0 data. Of the 2114 entries, 1010 had their names normalized and tagged with corresponding Wikidata IDs, the other 1104 remain to be resolved as no Wikidata IDs currently exist for them.
Filename: eoCompound.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoCompound/eoCompound.xml
Description: A dictionary of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336 with descriptions of 250 characters or less.
Filename: eoActivity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoActivity/eoActivity.xml
The organisms used as targets of experiments conducted to determine what effect(s) (Activities) tested EOs may have on them. They may occur as A) single-cells or colonies, such as bacteria, fungi, yeasts and molds, protozoa, algae, or viruses; B) insects such as mosquitos, flies, etc.; or, C) they may be helminths, such as Nematodes (roundworms), Cestodes (tapeworms), and Trematodes (flukes).
Description: A dictionary of terms describing 307 target organisms resolved to wikidataIDs (including genus and species of bacteria, fungi, protist, protozoa, and other microorgnisms), with 154 terms including names of related diseases.
Filename: eoTargetOrganism.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoTargetOrganism/eoTargetOrganism.xml
Description: A dictionary of 3412 terms related to human diseases.
Filename: humanDiseases.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/humanDiseases/humanDiseases.xml
Description: A dictionary of 1032 terms for two categories of insects: A) Insect vectors of human pathogens sourced from https://en.wikipedia.org/wiki/Category:Insect_vectors_of_human_pathogens, and B) Winged insects soursed from https://www.insectidentification.org/winged-insect-key.asp
Filename: pests.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/pests/pests.xml
Great stuff! Now I have to figure out how to create extra tags for them!
On Wed, 25 Mar 2020 at 20:55, Emanuel Faria notifications@github.com wrote:
I have just completed (oh god, I hope!) all of the dictionaries!
As of today, we have 11 finished dictionaries. They are:
- eoActivity
- eoAnalysisMethod
- eoCompound
- eoExtractionMethod
- eoPlant
- eoPlantMaterialHistory
- eoPlantPart
- eoTargetOrganism
- geoLocation
- humanDiseases
- pests
... as well as a master INDEX of their descriptions, pasted below: Index Oil186 Dictionaries
This index https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/DictionaryDescriptionsOIL186/INDEXofOIL186Dictionaries.md contains information about the Manually Created Dictionaries for OIL186.
PLEASE NOTE: Rather than alphabetical order, are listed here in the logical progression.
The purpose/function of Dictionaries:
1.
Identify “things” as objects or concepts (eg. “e.coli" is a concept.). 2.
Give each object clear lexical names by which they can be searched. (An object that goes by more than one name is a synonym.) 3.
Give each object a link to wikidata (or other authorities) by which we can learn more about them.
-
Description: A dictionary of 1678 plant names extracted mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, 1567 had their names normalized and tagged with corresponding Wikidata IDs.
Filename: eoPlant.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlant/eoPlant.xml
EO Plant Part eoPlantPart.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantPart/eoPlantPart.md
-
Description: A dictionary of 285 plant part terms.
Filename: eoPlantPart.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantPart/eoPlantPart.xml
Geo Location geoLocation.md https://github.com/petermr/CEVOpen/blob/master/dictionary/geoLocation/geoLocation.md
-
Description: A dictionary of 9568 entries for geolocations including country, countryISOcode, city, latitude, longitude, postal code and time zone sourced from http://www.ip2location.com, along with data agumenting Indian States-Cities created and maintained over the years obtained at https://network.convergenceservices.in/forum/12-joomla-development/4305-mysql-tables-for-country-states-and-indian-states-cities.html .
License information: This site or product includes IP2Location LITE data available from http://www.ip2location.com
Filename: geoLocation.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/geoLocation/geoLocation.xml
EO Plant Material History eoPlantMaterialHistory.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantMaterialHistory/eoPlantMaterialHistory.me
-
Description: A dictionary of 81 entries relating to the plant material history leading up to the extraction of Essential Oils mentioned in selected literature chosen from the 186 test articles downloaded from PubMed. The entries include key words and phrases describing: growth conditions, plant life stages, plant material selection, post-harvest treatment methods, and extracted plant material products. Of the 82 entries, 58 were resolved to WikidataIDs.
Filename: eoPlantMaterialHistory.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantMaterialHistory/eoPlantMaterialHistory.xml
EO Extraction Method eoExtractionMethod.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoExtractionMethod/eoExtractionMethod.md
-
Description: A dictionary of 87 terms for Essential Oil extraction methods and apparatus.
Filename: eoExtractionMethod.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoExtractionMethod/eoExtractionMethod.xml
EO Analysis Method
Analytical chemistry https://en.wikipedia.org/wiki/Analytical_chemistry studies and uses instruments and methods used to separate https://en.wikipedia.org/wiki/Separation_process, identify, and quantify https://en.wikipedia.org/wiki/Quantification_(science) matter.[1] https://en.wikipedia.org/wiki/Analytical_chemistry#cite_note-isbn0-03-005938-0-1 In practice, separation, identification or quantification may constitute the entire analysis or be combined with another method. Separation isolates analytes https://en.wikipedia.org/wiki/Analyte. Qualitative analysis https://en.wikipedia.org/wiki/Qualitative_inorganic_analysis identifies analytes, while quantitative analysis https://en.wikipedia.org/wiki/Quantitative_analysis_(chemistry) determines the numerical amount or concentration.
Analytical chemistry consists of classical, wet chemical methods https://en.wikipedia.org/wiki/Wet_chemistry and modern, instrumental methods https://en.wikipedia.org/wiki/Analytical_chemistry#instrumental_methods. [2] https://en.wikipedia.org/wiki/Analytical_chemistry#cite_note-isbn0-03-002078-6-2 Classical qualitative methods use separations such as precipitation https://en.wikipedia.org/wiki/Precipitation_(chemistry), extraction https://en.wikipedia.org/wiki/Extraction_(chemistry), and distillation https://en.wikipedia.org/wiki/Distillation. Identification may be based on differences in color, odor, melting point, boiling point, radioactivity or reactivity. Classical quantitative analysis uses mass or volume changes to quantify amount. Instrumental methods may be used to separate samples using chromatography https://en.wikipedia.org/wiki/Chromatography, electrophoresis https://en.wikipedia.org/wiki/Electrophoresis or field flow fractionation https://en.wikipedia.org/wiki/Field_flow_fractionation. Then qualitative and quantitative analysis can be performed, often with the same instrument and may use light interaction https://en.wikipedia.org/wiki/Spectroscopy, heat interaction https://en.wikipedia.org/wiki/Thermodynamics, electric fields https://en.wikipedia.org/wiki/Electrochemistry or magnetic fields https://en.wikipedia.org/wiki/Nuclear_magnetic_resonance_spectroscopy. Often the same instrument can separate, identify and quantify an analyte.
(Source: https://en.wikipedia.org/wiki/Analytical_chemistry https://en.wikipedia.org/wiki/Analytical_chemistry) eoAnalysisMethod.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoAnalysisMethod/eoAnalysisMethod.xml
-
Description: A dictionary of 117 entries describing instruments and methods used to separate https://en.wikipedia.org/wiki/Separation_process, identify, and quantify https://en.wikipedia.org/wiki/Quantification_(science) matter — 105 being resolved to wikidata IDs, and 95 with short descriptions.
Filename: eoAnalysisMethod.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoAnalysisMethod/eoAnalysisMethod.xml
EO Compound
Essential Oils (EOs) are the concentrated hydrophobic liquid containing volatile chemical compounds extracted from plants. Essential oils are also known as volatile oils, ethereal oils, aetherolea, or simply as the oil of the plant from which they were extracted, such as oil of clove.
Qualitative (constituent compounds) and quantitative (%) analysis of the chemical composition of the tested Essential Oils (Extracts?), with each known compound linked to its IUPAC International Chemical Identifier (InChI). eoCompound.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoCompound/eoCompound.md
-
Description: A dictionary of 2114 constituent chemical compounds extracted from Essential Oils converted from essoldb1.0 data. Of the 2114 entries, 1010 had their names normalized and tagged with corresponding Wikidata IDs, the other 1104 remain to be resolved as no Wikidata IDs currently exist for them.
Filename: eoCompound.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoCompound/eoCompound.xml
EO Activity eoActivity.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoActivity/eoActivity.md
-
Description: A dictionary of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336 with descriptions of 250 characters or less.
Filename: eoActivity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoActivity/eoActivity.xml
EO Target Organism
The organisms used as targets of experiments conducted to determine what effect(s) (Activities) tested EOs may have on them. They may occur as A) single-cells or colonies, such as bacteria, fungi, yeasts and molds, protozoa, algae, or viruses; B) insects such as mosquitos, flies, etc.; or, C) they may be helminths, such as Nematodes (roundworms), Cestodes (tapeworms), and Trematodes (flukes). eoTargetOrganism.md https://github.com/petermr/CEVOpen/blob/master/dictionary/eoTargetOrganism/eoTargetOrganism.md
-
Description: A dictionary of terms describing 307 target organisms resolved to wikidataIDs (including genus and species of bacteria, fungi, protist, protozoa, and other microorgnisms), with 154 terms including names of related diseases.
Filename: eoTargetOrganism.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoTargetOrganism/eoTargetOrganism.xml
Human Diseases humanDiseases.md https://github.com/petermr/CEVOpen/blob/master/dictionary/humanDiseases/humanDiseases.md
-
Description: A dictionary of 3412 terms related to human diseases.
Filename: humanDiseases.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/humanDiseases/humanDiseases.xml
Pests disease.md https://github.com/petermr/CEVOpen/blob/master/dictionary/disease/disease.md
-
Description: A dictionary of 1032 terms for two categories of insects: A) Insect vectors of human pathogens sourced from https://en.wikipedia.org/wiki/Category:Insect_vectors_of_human_pathogens, and B) Winged insects soursed from https://www.insectidentification.org/winged-insect-key.asp
Filename: pests.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/pests/pests.xml
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/6#issuecomment-604082505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMU2ULWUR4ODYE5Y3TLRJJVTRANCNFSM4IRS2RBQ .
-- Clyde
A daily record of activities by each contributor