Open petermr opened 4 years ago
Sir, please go through the composition table - composition20191028.tsv
185 articles analysed.
Got it. I'll dig into the new tools from Jon to create some tags and test some annotations. Then I'll see if I can indeed export them and post them here (somewhere) for you to let me know if they are useable.
I will also see which tools (grep, easyfind, or even spotlight) could best be used to maximize accuracy and speed and show you what I come up with for your feedback on how best to proceed most efficiently.
(My time is limited today, but intend to devote no less than 90 min to the above.)
Thanks for your guidance, Peter.
Sir, would you please brief about annotations. What sections you want to annotate? Let me get some idea.
Ambarish, please concentrate completely on your assignment on composition tables. We will not be using annotation for compounds at this stage. I will be making the decisions about sections and will post information here as it is required.
On Tue, Oct 29, 2019 at 3:46 PM Ambarish Kumar notifications@github.com wrote:
Sir, would you please brief about annotations. What sections you want to annotate?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/45?email_source=notifications&email_token=AAFTCS6GZGNFP7H3BFTCHHLQRBLE7A5CNFSM4JGAZTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECRAJQQ#issuecomment-547488962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZYO3XLMZTJ6P3JB23QRBLE7ANCNFSM4JGAZTRA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I am using GREP to update the table/log of activities (shown in the first entry of this issue) in our test batch of research articles currently in the Oil186 repository so that...
Goals: The Challenge, the solution we will bring, and the Desired End State by which all will know we have achieved excellence.
Steps to achieve the Goal(s):
Screenshot of my search results:
Desired Results: A clear and concise description / outline of the final "state or vision" of the project — the evidence we will see when our goals are achieved.
Excluding any activities found in articles under the heading “References”, I will log the following in each section of each article in the OIL186 repository:
I will also add any new activities, if any, to our Activities Dictionary.
As described above, I have updated the table (now called OIL186-activity20191101.txt) to include the first occurrence only of any activities that were found in the activity dictionary — which I have updated (see ActivitiesNormalizedE1.020191101.txt) with newly found activities, as well as some notes to consider before we update the Activity Dictionary.
Both files are attached as tab-delimited txt files (I don't have the option to save as tsv)
A) Naming. There is no need to include OIL186 as its already in the directory tree. so OIL186-activity20191101.txt => activity20191101.txt B) this is a TSV file, so please rename to activity20191101.tsv (I have done so) C) There are FAR too many rows and columns in this. I do not understand either. There should be about 300 rows (one for each REPORTED ACTIVITY MEASUREMENT TABLE. Please stick to the template I started. D) Please keep rows in SORTED order (by PMCID) The purpose of this is to be a gold standard for extracting tables of activity.
Suggest we talk.
Ohhhh.... this explains so much! 🤦🏻♂️
Oops. My bad.
I thought the tables product we last talked about was an entirely new task for me to complete.
What I did here was look for the occurance of every activity we had listed in our dictionary, and the article it's found in. The next column, I THOUGHT you wanted me to then annotate each phrase that described the biological/chemical method that activity was enacted/completed (whatever).
Oh well. The good news is, at leas the task I'm SUPPOSED to do is a much quicker task by comparison.
Let's get it sorted when we talk tomorrow Nov 4, 2019 .
Manny
@petermr I'm having trouble figuring out how to map the table column headings as show in the scholarly.html files in OIL186 searches to a single row in the spreadsheet template we worked on together.
Some of the issues I'm finding include:
How do I handle things like the situation in the image below? We need to decide a rule to "mark down" some of these into something you can use.
For example, do I use:
Columnl1 = Microorganism (C.decurrens, C. sempervirens, T. articulata} or Column1 = Microorganism Column2 = C.decurrens (MIC90, MBC) Column3 = C. sempervirens (MIC90, MBC) Column4 = T. articulata (MIC90, MBC) or Column1 = Microorganism Column2 = MIC90 (C.decurrens) Column3 = MBC (C.decurrens) Column4 = MIC90 (C. sempervirens) Column5 = MBC (C. sempervirens) Column6 = MIC90 (T. articulata) Column7 = MBC (T. articulata) Column8 = Gentamycin Mean (µg/mL) ± Standard Deviation Column9 = Gentamycin Mean (µg/mL) ± Standard Deviation _But then I still don't know how/where to describe Mean (µL/mL) ± Standard Deviation for you_
This example is for PMC5423258 Original article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5423258/
talk in 30 mins
On Tue, Nov 5, 2019 at 4:23 PM Emanuel Faria notifications@github.com wrote:
@petermr https://github.com/petermr I'm having trouble figuring out how to map the table column headings as show in the scholarly.html files in OIL186 searches to a single row in the spreadsheet template we worked on together.
Some of the issues I'm finding include:
- Multiple Header rows
- Some of the headings seem to have been merged fields covering two or more columns
- Sometimes there is repetition of the column headings to cover different substances being tested
How do I handle things like the situation in the image below? We need to decide a rule to "mark down" some of these into something you can use.
For example, do I use:
Columnl1 = Microorganism (C.decurrens, C. sempervirens, T. articulata} or Column1 = Microorganism Column2 = C.decurrens (MIC90, MBC) Column3 = C. sempervirens (MIC90, MBC) Column4 = T. articulata (MIC90, MBC) or Column1 = Microorganism Column2 = MIC90 (C.decurrens) Column3 = MBC (C.decurrens) Column4 = MIC90 (C. sempervirens) Column5 = MBC (C. sempervirens) Column6 = MIC90 (T. articulata) Column7 = MBC (T. articulata) Column8 = Gentamycin Mean (µg/mL) ± Standard Deviation Column9 = Gentamycin Mean (µg/mL) ± Standard Deviation But then I still don't know how/where to describe Mean (µL/mL) ± Standard Deviation for you
This example is for PMC5423258 Original article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5423258/
[image: Screenshot 2019-11-05 12 41 27] https://user-images.githubusercontent.com/9612595/68223363-676ab080-ffcb-11e9-8c4f-6394d7d7e690.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/45?email_source=notifications&email_token=AAFTCS3SRDTH7KWBQ5DNV53QSGMWXA5CNFSM4JGAZTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDJLCQ#issuecomment-549885322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6KHMSJPX3T4CK4DC3QSGMWXANCNFSM4JGAZTRA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
The tables with multiple headers are not always properly rendered. Am working on that.
I think the best thing is to collect tables into issues.
Skype?
On Tue, Nov 5, 2019 at 4:23 PM Emanuel Faria notifications@github.com wrote:
@petermr https://github.com/petermr I'm having trouble figuring out how to map the table column headings as show in the scholarly.html files in OIL186 searches to a single row in the spreadsheet template we worked on together.
Some of the issues I'm finding include:
- Multiple Header rows
- Some of the headings seem to have been merged fields covering two or more columns
- Sometimes there is repetition of the column headings to cover different substances being tested
How do I handle things like the situation in the image below? We need to decide a rule to "mark down" some of these into something you can use.
For example, do I use:
Columnl1 = Microorganism (C.decurrens, C. sempervirens, T. articulata} or Column1 = Microorganism Column2 = C.decurrens (MIC90, MBC) Column3 = C. sempervirens (MIC90, MBC) Column4 = T. articulata (MIC90, MBC) or Column1 = Microorganism Column2 = MIC90 (C.decurrens) Column3 = MBC (C.decurrens) Column4 = MIC90 (C. sempervirens) Column5 = MBC (C. sempervirens) Column6 = MIC90 (T. articulata) Column7 = MBC (T. articulata) Column8 = Gentamycin Mean (µg/mL) ± Standard Deviation Column9 = Gentamycin Mean (µg/mL) ± Standard Deviation But then I still don't know how/where to describe Mean (µL/mL) ± Standard Deviation for you
This example is for PMC5423258 Original article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5423258/
[image: Screenshot 2019-11-05 12 41 27] https://user-images.githubusercontent.com/9612595/68223363-676ab080-ffcb-11e9-8c4f-6394d7d7e690.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/45?email_source=notifications&email_token=AAFTCS3SRDTH7KWBQ5DNV53QSGMWXA5CNFSM4JGAZTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDJLCQ#issuecomment-549885322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6KHMSJPX3T4CK4DC3QSGMWXANCNFSM4JGAZTRA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Hi Peter,
As I continue interpreting the tables into our "table formula/equation”, will you please double-check my “facts” below so I can keep going with confidence?
I'm looking at table 2 for https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5203915/ (see screenshot attached below)
Assuming we are counting the fungal and bacterial species being tested together, my guess is APA(O2C3A2S6P3) …
Otherwise, separating out the bacterial and fungal activities… for bacteria, value for S would be 5 = APA(O2C3A2S5P3) for fungus, value for S would be 1 = APA(O2C3A2S1P3)
By the way, Do you still want me to separate species types being tested in ONE table into SEPARATE tables?
All of this assumes I have the following correct, and haven’t left anything out:
O = Essential Oil(s) tested C = Control(s) used (if any) A = Activity(ies) tested S = Species being tested P = Parameters = number of measurement types
… Any corrections for me?
Thanks!
Manny
If I committed correctly, I just put my updates into Activity_Tables_Breakdown_2019-11-07.tsv into articleAnalysis/oil186/raw
@petermr As I was generating "table-description formulas" (from which you will create regex/GREP search functions by which you will parse future tables into machine-readable data), I realized it may help to see images of similar tables side by side so that variations within them could more easily appear -- along with solutions to the regex challenges.
So here's what I've done:
@petermr if you pull this directory down to your Mac, sort the images into the appropriate folders, we may save some time extracting meanings and methods from them.
After sorting, you might also choose to delete all that are redundant, and I/we can then focus on generating "table-description formulas" for the remainder.
I'm sure you could think of other possibilities too.
Please let me know what you think... and if I should proceed adding more screenshots for the rest of the oil186 articles.
Thanks!
Manny
Thanks Sounds useful I have fixed the bug in displaying HTML tables and will commit them
On Fri, 8 Nov 2019, 00:30 Emanuel Faria, notifications@github.com wrote:
@petermr https://github.com/petermr As I was generating "table-description formulas" (from which you will create regex/GREP search functions by which you will parse future tables into machine-readable data), I realized it may help to see images of similar tables side by side so that variations within them could more easily appear -- along with solutions to the regex challenges.
So here's what I've done:
- Took screenshots of all activity tables in articles PMC4391421 to PMC5622390 (more to come, if you find this useful).
- Named the table image files: "ArticleID_Tx". (T=Table, x= table number)
- Added a new directory here: https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw/Example_Table_Images/
Inside that directory, added the following sub-folders:
- __table_images_to_Sort
- APA
- GRID
- IRREGULAR
@petermr https://github.com/petermr if you pull this directory down to your Mac, sort the images into the appropriate folders, we may save some time extracting meanings and methods from them.
After sorting, you might also choose to delete all that are redundant, and I/we can then focus on generating "table-description formulas" for the remainder.
I'm sure you could think of other possibilities too.
Please let me know what you think... and if I should proceed adding more screenshots for the rest of the oil186 articles.
Thanks!
Manny
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/45?email_source=notifications&email_token=AAFTCSZBNBQSV6PV7HYZIW3QSSXK5A5CNFSM4JGAZTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDOJ7PI#issuecomment-551329725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6E77T7PJ7TRY2UR6TQSSXK5ANCNFSM4JGAZTRA .
The activity references have been added manually into: https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/activity20191028.tsv For any article there may be 0,1,2,3... activities (not normally more). For each activity there should be:
The activity table should list all triples for each paper. If the mentions and the tables are inconsistent note what has been omitted or duplicated.
The first few rows are:
The title of the Table should match roughly with the measurement method and description of results.
This is messy because Tables may report more than one actvity (as here)