Open EmanuelFaria opened 4 years ago
I don’t know how to best name/describe this dictionary. It’s shares entries with other dictionaries.
Â
Description: A dictionary of [XX] plant processes from which Essential Oils — mentioned in the 186 test articles downloaded from PubMed — were harvested.
Filename: process20191014.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/process/process20191014.xmlprocess20191014.xml
Â
A dictionary of [XX] plant processes from which Essential Oils — mentioned in the 186 test articles downloaded from PubMed — were harvested.
Â
Filename: process20191014.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/process/process20191014.xmlprocess20191014.xml
Â
title: type of data to be normalized and tagged with Wikidata ID. In this case, “plantParts"
description: Short description of the process being identified in that row
id: ???
name: a human readable string describing the concept.
term: the precise string used to identify the concept. (Name and Term are often the same.)
wikidata: Unique identifier for each normalized dictionary term, linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
wikipedia: ???
query: ???
Â
No. of source papers: ??
No. of Entries (Headers are not counted): 74
No. of unique entries (including alternate spellings or synonyms): 74
No. of processes resolved in Wikidata: 22
No. of processes NOT resolved in Wikidata: 52
Â
I think we need to figure out what this dictionary is and rename it accordingly.
More work needs to be done on this dictionary.
This dictionary has terms related to other dictionaries such as:
Origin (eg. altitude), timing of harvest (eg. beginning of flowering, autumn), plant sex (eg. Female plants), post harvest handling (eg. freeze-dried), water source (eg. spring), etc…. could it be that after normalizing, some of these are moved from here to other dictionaries?
I don’t know how to describe the following column headings: id, wikipedia, query
there are queries that need to be dealt with and deleted.
The title of this Dictionary and it's description document has been changed to Plant Material History
To better identify and organize the data, I have created two new columns in the table (bulleted below), and categorized the terms accordingly (Note that these could also be useful as dependent drop-down lists in a database):
PlantHistoryCat1 — The MAIN category of differentiation of the types of data related to the Plant Material History
PlantHistoryCat2 — The SUB category of differentiation of the types of data related to the Plant Material History
There is still some doubt/ambiguity to clear up about some of the WikidataID numbers. To be able to discuss this efficiently, I have created the following new columngs
/link/@wikidata — hyperlink to a wikidata page or search results for the item in question
/wikiIDconfidence — my colour-coded “confidence rating” (Low=RED, Medium=yellow, Green=HIGH) on how well the WikidataID matches the entry term.
/desc/@wikidata — in some places, I have pasted in the description supplied on the correlating Wikidata page
/desc/@wikipedia — in some cases, there was no machine term in wikidata, but there was in wikipedia. I have copied some of the wikipedia descriptions for the term here.
Notes — Where I have listed some questions to discuss with [@petermr]
  PDF and xlsx documents attached here for reference and discussion] PlantMaterialHistory20200202.pdf PlantMaterialHistory20200202.xlsx
Next, I will start a new comment to itemize the Questions/Problems to be resolved.
On Mon, Feb 3, 2020 at 3:53 PM Emanuel Faria notifications@github.com wrote:
Since my last post, the following has been accomplished:
1.
The title of this Dictionary and it's description document has been changed to Plant Material History
Good, Please remove any spaces from this Best is camelcase plantMaterialHistory
1. 2.
To better identify and organize the data, I have created two new columns in the table (bulleted below), and categorized the terms accordingly (Note that these could also be useful as dependent drop-down lists in a database):
*PlantHistoryCat1* — The *MAIN* category of differentiation of the types of data related to the Plant Material History - *PlantHistoryCat2* — The *SUB* category of differentiation of the types of data related to the Plant Material History 3.
There is still some doubt/ambiguity to clear up about some of the WikidataID numbers. To be able to discuss this efficiently, I have created the following new columngs
*/link/@Wikidata <https://github.com/Wikidata>* — hyperlink to a wikidata page or search results for the item in question - */wikiIDconfidence* — my colour-coded “confidence rating” (Low=RED, Medium=yellow, Green=HIGH) on how well the WikidataID matches the entry term. - */desc/@Wikidata <https://github.com/Wikidata>* — in some places, I have pasted in the description supplied on the correlating Wikidata page - */desc/@Wikipedia <https://github.com/Wikipedia>* — in some cases, there was no machine term in wikidata, but there was in wikipedia. I have copied some of the wikipedia descriptions for the term here. - *Notes* — Where I have listed some questions to discuss with [ @petermr <https://github.com/petermr>]
Looks good - will discuss when we talk.
1.
PDF and xlsx documents attached here for reference and discussion] PlantMaterialHistory20200202.pdf https://github.com/petermr/CEVOpen/files/4148738/PlantMaterialHistory20200202.pdf PlantMaterialHistory20200202.xlsx https://github.com/petermr/CEVOpen/files/4148739/PlantMaterialHistory20200202.xlsx
Next, I will start a new comment to itemize the Questions/Problems to be resolved.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/81?email_source=notifications&email_token=AAFTCS64QSBEXG3FMVQBP3TRBA4XTA5CNFSM4KMMCLDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKULEVI#issuecomment-581481045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4O2MH3CFI42FZUGJLRBA4XTANCNFSM4KMMCLDA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Please remove any spaces from this Best is camelcase plantMaterialHistory
Done. Thanks
Â
Many wikidataIDs lead to a page with the exact term searched, but the page lacks any data whatsoever.
Many terms have no eactly corresponding wikidata page — do we create one … even as an placeholder as above (in item #1)
Â
How will we use these, exactly?
As I have added new items, will all of these will need to be renumbered from top to bottom?
If so, by which column(s) shall I sort them?
Hand-serializing these IDs that contain words, dots, and numbers is time-consuming and can introduce errors. Would it not be better to just use serialized numbers? That way, a future database can begin from wherever we left off at any time.
What happens when we add new entries in the future — for example, for Synonyms or for those related by Cat1 and Cat2 grouping?
Should the new IDs follow the ones they’re most related to?
Do we set up the IDs to include a “.” between category groupings to indicate Cat1 and Cat2 etc? (Example below)
DAVE.PlantMaterialHistory.GC.S.1 | Â | Growth Conditions | Season | winter |
---|---|---|---|---|
DAVE.PlantMaterialHistory.GC.S.2 | Â | Growth Conditions | Â | spring |
DAVE.PlantMaterialHistory.GC.S.3 | Â | Â | Â | Summer |
In the original table, there were some terms that seem impossible to expect would have Wikidata or Wikipedia entries (examples below). Should these be deleted, or “tagged” in some way as useful “semantic” phrases?
temperature variation
irrigation with fresh water
shade-dried
residual oil
conventionally distilled oil
volatile extract
wax extract
fruit set
Â
One-by-one, I checked the WikiIds in the original table. While many were obviously correct, some were less so.
I have corrected and added all that I confidently could, but there are still others that pose problems such as:
IDs that link to papers that mention the term, but are not specifically related to the term. Examples:
Water distillation https://www.wikidata.org/wiki/Q274959
Hydrodistillation https://www.wikidata.org/wiki/Q64097733
Steam Distillation https://www.wikidata.org/wiki/Q1164392
Â
Some disambiguation found here:https://www.researchgate.net/post/What_is_different_between_water_steam_distillation_and_steam_distillation_system Â
In water distillation or hydro distillation, elevated pressures is used with plants whose essential oils are difficult to extract at higher temperatures.
In steam distillation, plant material is placed into a steam distillation chamber. Steam is forced into the chamber with it. As the essential oil interacts with the steam, the steam flows into the chilled condensed chamber, turning back into a liquid, providing the essential oil.
Hydro distillation with Clevenger trap is used for the extraction of volatile oil(essential oil) and steam distillation is used in Industries for the isolation of volatile oil.Â
The advantage of steam distillation is that the plant material can be recovered after oil extraction for solvent extraction for the isolation of other non volatile compounds whereas in hydro distillation the plant material is continuously boiled and not possible to recover. For large scale distillation handling of water is also not convenient. Â
Recovery of oil is higher in hydro distillation compared to steam distillation.Â
Â
Which of the terms below, if any, would be the right WikiID?
By what process can I decide with confidence — for this dictionary and all the others?
Where WikiIDs don’t exist, what data will be the minimum required to make it worth us creating new ones in wikidata? If we do upload, when/wherehow will get get that new ID, and how do we feed it back into our Dictionary?
Â
Example:
headspace solid phase microextraction Q58832405
solid-phase microextraction Q903970
Â
While trying to disambiguate terms for “Drying Methods”, I found there were many other drying methods that were not in our dictionary. I was tempted to add them, in case that helps identify more instances in the literature, or if we are letting our coding build the entirety of the Dictionary. What shall we do?
Â
Example 1: this page on wikipedia :
https://en.wikipedia.org/wiki/Drying
In the most common case, a gas stream, e.g., air, applies the heat by convection and carries away the vapor as humidityhttps://en.wikipedia.org/wiki/Humidity. Other possibilities are vacuum dryinghttps://en.wikipedia.org/wiki/Vacuum_drying, where heat is supplied by conductionhttps://en.wikipedia.org/wiki/Heat_conduction or radiationhttps://en.wikipedia.org/wiki/Radiation (or microwaveshttps://en.wikipedia.org/wiki/Microwaves), while the vapor thus produced is removed by the vacuumhttps://en.wikipedia.org/wiki/Vacuum system. Another indirect technique is drum dryinghttps://en.wikipedia.org/wiki/Drum_drying (used, for instance, for manufacturing potato flakes), where a heated surface is used to provide the energy, and aspirators draw the vapor outside the room. In contrast, the mechanical extraction of the solvent, e.g., water, by filtrationhttps://en.wikipedia.org/wiki/Filtration or centrifugationhttps://en.wikipedia.org/wiki/Centrifugation, is not considered "drying" but rather "draining".
Â
Example two: this one on researchgate.net https://www.researchgate.net/post/Which_drying_methods_are_practiced_to_dry_plant_biomass_of_spices_agricultural_horticultural_medicinal_and_aromatic_plants
Traditionally agricultural/horticultural crops, spices, medicinal and aromatic plants and other plant products are dried in shade or Sun. Subsequently hot-air oven drying, solar drier drying, cross-flow drying, through-flow drying, vacuum shelf drying etc. techniques have been employed. Recently microwave drying, freeze drying, infrared or inert gas drying and combo drying techniques have also been used. What other methods are in practice and what are their advantages and disadvantages?
Â
Example:
"solvent extraction“ is it an extraction technique, or does the word “solvent” make it an extraction component?
Â
Is this an EO extraction technique or a plant extract?
I have just committed the finished (I hope!) dictionary: PlantMaterialHistory.xml
The good news
The slightly annoying news @petermr I can't get the dictionary to open in XML Notepad. Using an online syntax-checker, seems to be a hidden character causing problems (See screenshot) I've spent more time fiddling with this than I did entering the data. Please take a look — and when you fix it — please let me know how you did it.
Thanks, @mannyrules
The dictionary is well-thought out. I have made some stylistic changes - e.g. leading chars should be lowercase and reserved words should not have spaces. Your toolchain has made a complete mess of the file. In future you shouldn't use any tool for editing dictionaries unless we have jointly agreed it. I have edited out the null characters, spurious quotes, etc. The more slick a tool is the more likely it is to have strange characters. My guess is that Excel was used at some stage. See if XMLNotepad can read, edit and save to current dictionary without corruption. It should be OK.
P.
Your comments on resolving to Wikidata, and adding concepts are good. However I think we should leave it as it is, UNLESS you are able to find an authority (e.g. USDA) which already has a glossary. Not high priority. These terms will not map prettily to Wikidata. some are fine. some are very broad.
Sounds good. Let's discuss on our next call.
am around before 1700 UTC
On Wed, Feb 5, 2020 at 7:20 PM Emanuel Faria notifications@github.com wrote:
Sounds good. Let's discuss on our next call.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/81?email_source=notifications&email_token=AAFTCS4XTV62PH5DKA5SWJ3RBMGQZA5CNFSM4KMMCLDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4URYA#issuecomment-582568160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4FPB625QH7NZ57DNDRBMGQZANCNFSM4KMMCLDA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
You have nicely developed a simple https://en.wikipedia.org/wiki/Faceted_classification for the dictionary. Categories ay need relabelling.
On Tue, Feb 4, 2020 at 9:16 PM Emanuel Faria notifications@github.com wrote:
Progress Update
I have just committed the finished (I hope!) dictionary: PlantMaterialHistory.xml https://github.com/petermr/CEVOpen/blob/master/dictionary/plantmaterialhistory/plantmaterialhistory.xml
The good news
- All of the issues in my previous comment are now resolved, and I have confidence in the wikidataIDs, where assigned.
- The number of entries increased from 82 to 96, owing to additional entries added for different drying and extraction methods that were absent in the earlier version.
- I've updated PlantMaterialHistoryDictionaryDescription.md https://github.com/petermr/CEVOpen/blob/master/dictionary/plantmaterialhistory/PlantMaterialHistoryDictionaryDescription.md and INDEXofOIL186Dictionaries.md https://github.com/petermr/CEVOpen/blob/master/dictionary/INDEXofOIL186Dictionaries.md
The slightly annoying news @petermr https://github.com/petermr I can't get the dictionary to open in XML Notepad. Using an online syntax-checker, seems to be a hidden character causing problems (See screenshot https://www.dropbox.com/s/rzs1zwyu6r15c3a/Screenshot%202020-02-04%2018.03.50.png?dl=0) I've spent more time fiddling with this than I did entering the data. Please take a look — and when you fix it — please let me know how you did it.
Thanks, @mannyrules https://github.com/mannyrules
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/81?email_source=notifications&email_token=AAFTCS7WKRALQIFOYXM77L3RBHLLRA5CNFSM4KMMCLDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKZG4TA#issuecomment-582118988, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7KOA55FXWABOLQDHDRBHLLRANCNFSM4KMMCLDA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Thanks thanks @petermr. I'm around today. Skype me when you get in.
maybe 30 min?
On Thu, Feb 6, 2020 at 1:03 PM Emanuel Faria notifications@github.com wrote:
Thanks thanks @petermr https://github.com/petermr. I'm around today. Skype me when you get in.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/81?email_source=notifications&email_token=AAFTCS4XTYXBBOHUKBZZRN3RBQDBXA5CNFSM4KMMCLDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7ESCY#issuecomment-582895883, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS2Q6VYPJDO7NPEWUZ3RBQDBXANCNFSM4KMMCLDA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
PlantMaterialHistory.xml and PlantMaterialHistoryDictionaryDescription.md are now updated and working. I have also updated master INDEXofOIL186Dictionaries.md
I would like to go back, however, and move the items having to do with distillation methods to a new separate dictionary. I will comment here again when that is done.
@petermr Looking more closely at this dictionary, I think that besides separating out a new dictionary for distillation methods, we could also create a separate dictionary for plant growth stages. This may be overkill, but I just found this article entitled, "Whole-Plant Growth Stage Ontology for Angiosperms and Its Application in Plant Biology" http://www.plantphysiol.org/content/142/2/414 where (if I've read this right) they have identified 112 "active terms".
@petermr Now that we have a stand-alone dictionary for EO Extraction Methods, I have deleted the ones that were in PlantMaterialHistory.xml, renumbered the DAVEids, and updated the PlantMaterialHistoryDictionaryDescription.md as well as the master Index
Here's the updated Dictionary Entry:
Â
A dictionary of 73 terms for Essential Oil extraction methods.
Â
Filename: ExtractionMethod.xm
File Location:
https://github.com/petermr/CEVOpen/blob/master/dictionary/ExtractionMethod/ExtractionMethod.xml
Â
Â
DAVEid: DAVE.activity.n where n is a serialized number
term: The name is a human readable string describing the concept.
Acronym: The acronym for the term, if any.
Apparatus: Apparatus used to conduct the extraction method described by the term.
wikidataid: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
description: short description of the activity sourced from wikidata and/or wikipedia
Â
No. of Entries (The header is not counted): 73
No. of terms describing EO Extraction Methods resolved in Wikidata: 71
No. of unique Wikidata IDs (including synonyms): 37
No. of entries with no Wikidata IDs: 2
No. of source articles with no Analysis Type found: 64
Description: A dictionary of 81 entries relating to the plant material history leading up to the extraction of Essential Oils mentioned in selected literature chosen from the 186 test articles downloaded from PubMed. The entries include key words and phrases describing: growth conditions, plant life stages, plant material selection, post-harvest treatment methods, and extracted plant material products. Of the 82 entries, 58 were resolved to WikidataIDs.
Filename: eoPlantMaterialHistory.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantMaterialHistory/eoPlantMaterialHistory.xml
Â
id: serialized identifier
PlantMatHistCat1:
PlantMatHistCat2:
Term: the precise string used to identify the concept to be normalized and tagged with Wikidata ID.
wikidataID: Unique identifier for each normalized dictionary term, linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines
Â
No. of source papers: 186
No. of unique entries (including alternate spellings or synonyms): 81
No. of unique WikidataIDs resolved to SearchTerms: 58
Â
Here we describe the process of creating a [DictionaryName]DictionaryDescription.md document, within which we will describe the contents of the individual dictionary (named in the title of this Issue), which was created (or is in the process of being created) from data collected for Oil186.
I will begin this thread by pasting the contents of the INDEX description, then follwed by first draft copy below for discussion and direction.