petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

Manual analysis of activities reported in oil186 articles #19

Closed petermr closed 4 years ago

petermr commented 4 years ago

manually read ?50? articles and record what activities are reported. Goal is to create a schema into which these can be extracted:

Current thoughts:

ambarishK commented 4 years ago

Yes sir.

petermr commented 4 years ago

Typical results

PMC4391421

scholary.html

Introduction

The genus Thymus, member of the Lamiaceae family, contains about 400 species of perennial aromatic, evergreen or semi-evergreen herbaceous plants with many subspecies, varieties, subvarieties and forms [ 1 ]. In Romania, the Thymus genus contains one species cultivated as aromatic plant (Thymus vulgaris) and other 18 wild species [ 2 ]. T. vulgaris (thyme), locally known as "cimbru", is widely used in the Romanian folk medicine for its expectorant, antitussive, antibroncholitic, antispasmodic, anthelmintic, carminative and diuretic properties.

Extract:

Match these with Wikidata terms and add QNumber

Create entries in activity dictionary if not already present.

Materials and Methods

Find the subsection on testing

Determination of antimicrobial activity

Thyme EO was tested on 7 common food-related bacteria and fungus: Staphylococcus aureus (ATCC 25923), Pseudomonas aeruginosa (ATCC 27853), Salmonella typhimurium (ATCC 14028), Escherichia coli (ATCC 25922), Klebsiella pneumoniae (ATCC 13882), Enterococcus faecalis (ATCC 29212) and Candida albicans (ATCC 10231), using the disk diffusion method as previously described [ 10 ].

Extract organisms

(use full binomial, not "E. coli")

Create organism dictionary and match against wikidata

Figure

Table

Table 2 Effects of thyme oil against bacteria expressed by the mean sizes of the inhibitory zones

Record this title (later we will extract data)

petermr commented 4 years ago

PMC5080681

 Background

used in the Palestinian folk medicine from ancient times as

Materials ad Methods

Antimicrobial tests

The essential oil of T. bovei ... ... Staphylococcus aureus (ATCC 25923) ...Escherichia coli (ATCC 25922) ... Pseudomonas aeruginosa (ATCC 27853)

... Methicillin Resistant Staphylococcus aureus (MRSA) clinical isolates.

The antifungal activity ... Candida albicans clinical isolate.

Anthelmintic activity

Due to its physiological and ... Pheretima posthuma (10 cm long)

petermr commented 4 years ago

data reported in papers

make TSV file of

PMCID

Main Country

main plant

background/introduction activities

single column with comma-separated list of activities with QNumbers

antimicrobial tests

comma-separated list of organisms (bacterial and fungi)

anthelminitic tests

list of target species

other tests

list activities

 table

is there a table of activity results

figures

is there a figure of reported results

ambarishK commented 4 years ago

Sir, please go through the activity test for species sheet.

Presently, it contains 12 records for parsed scientific articles.

I have added extra columns - EO composition table and Chemical structure of EO composition. (optional)

Please suggest any required changes.

Activitytestforspecies.tsv

petermr commented 4 years ago

Please give the filenames (not "the species sheet")

On Sat, Sep 28, 2019 at 10:10 AM Ambarish Kumar notifications@github.com wrote:

Sir, please go through the activity test for species sheet.

Presently, it contains 12 records for parsed scientific articles.

I have added extra columns - EO composition table and Chemical structure of EO composition. (optional)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCSZNJLW6TCGKLWVUEGTQL4NRNA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD72UR7Y#issuecomment-536168703, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6KF4RNMPAO5MXTXWLQL4NRNANCNFSM4I3ENS4Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

sir, Activitytestforspecies.tsv is the added sheet.

petermr commented 4 years ago

Filename/URL on Github should be https://github.com/petermr/CEVOpen/blob/master/Activitytestforspecies.tsv or

ambarishK commented 4 years ago

Sir, check for updated sheet for activity test for species.

https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20190930.tsv

Next is to add WIKIDATA identifier and make dictionaries.

ambarishK commented 4 years ago

Sir, dictionaries for TargetOrganisms are as follows.

TargetOrganism.xml

and scripts to generate dictionary files are as follows.

TargetOrganism50.sh

ambarishK commented 4 years ago
Making, disambiguating and cleaning dictionary for literature activities.

I have cleaned activities, prepared literature activity dictionary and disambiguated dictionary entry terms.

petermr commented 4 years ago

There is NO separate dictionary for literature activities. There is a single activity/ dictionary. The literature may suggest terms it should contain.

On Tue, Oct 1, 2019 at 10:33 AM Ambarish Kumar notifications@github.com wrote:

Making, disambiguating and cleaning dictionary for literature activities.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCS56ZTVS64G6JOG5FQ3QMMKQJA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAAULJQ#issuecomment-536954278, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYO53WT546WVQJZQWLQMMKQJANCNFSM4I3ENS4Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

OK sir.

Please check for the literature activity and dictionary file.

literature activities

Dictionary file

script to prepare dictionary

ambarishK commented 4 years ago

Sir, check for the updated sheet containing manual analysis records. It has 88 added records and it is in progress now.

activitytestforspecies20191002.tsv

petermr commented 4 years ago

I assume these are NEW records? - that gives us the whole of oil189. If so, well done and combine the two tables. Call it manualAnalysis20191002.tsv

On Wed, Oct 2, 2019 at 10:42 AM Ambarish Kumar notifications@github.com wrote:

Sir, check for the updated sheet containing manual analysis records. It has 88 added records and it is in progress now.

[activitytestforspecies20191002.tsv] ( https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191002.tsv )

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCS73ORDI2RWSWLTMI5DQMRUIZA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEFWHI#issuecomment-537418525, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS4CKUMK6Y67WYEZ75DQMRUIZANCNFSM4I3ENS4Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

Sir, new sheet contains all records ( previous one as well as updated one ). This is the consolidated sheet ( activitytestforspecies20191030.tsv + activitytestforspecies20191002.tsv ).

Yes sir. We can call the file - https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191002.tsv as mnualAnalysis20191002.tsv.

petermr commented 4 years ago

Please give filename not "new sheet" as I cannot find this.

On Wed, 2 Oct 2019, 11:07 Ambarish Kumar, notifications@github.com wrote:

Sir, new sheet contains all records ( previous one as well as updated one ). This is the consolidated sheet ( activitytestforspecies20191030.tsv https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191030.tsv

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCS2BOYAGAOFNCSBE6JDQMRXEVA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEHV4Q#issuecomment-537426674, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS4VOKWOQB6YUTDCSN3QMRXEVANCNFSM4I3ENS4Q .

ambarishK commented 4 years ago

Sir, check for the updated previous comment. I established link to the new sheet.

ambarishK commented 4 years ago

Sir, I have updated the activitytestforspecies20191002.tsv. Now it contains 100 records for analysed articles from oil186.

petermr commented 4 years ago

Good keep processing the oil186

On Wed, 2 Oct 2019, 13:12 Ambarish Kumar, notifications@github.com wrote:

Sir, I have updated the activitytestforspecies20191002.tsv https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191002.tsv. Now it contains 100 records for analysed articles from oil186.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCS3LS2OMRNVKTBDCHWDQMSF4JA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEQ2MI#issuecomment-537464113, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS7GLZ233NPMLOIP2NLQMSF4JANCNFSM4I3ENS4Q .

ambarishK commented 4 years ago

Sir, check for the processed articles from oil186.

activitytestforspecies20191003.tsv.

It contains 165 processed records. (previous one as well as records processed today)

Remaining are 20 articles which are in analysis process now.

petermr commented 4 years ago

Thank you - I may not be able to look at it for 4 hours but definitely before Gita comes

On Thu, Oct 3, 2019 at 10:32 AM Ambarish Kumar notifications@github.com wrote:

Sir, check for the processed articles from oil186.

activitytestforspecies20191003.tsv https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191003.tsv .

It contains 165 processed records.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCS4M5EHAH2B3B5OYAVTQMW34LA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAHTKYQ#issuecomment-537867618, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS7Q4KKLSXYCYTHD3NDQMW34LANCNFSM4I3ENS4Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

OK sir. I am about to process all articles and will update the sheet within 30 mins.

ambarishK commented 4 years ago

Sir, please go through the activitytestforspecies20191003Total.tsv. It covers analysis of all articles of oil186.

We may remove all previous sheets and call activitytestforspecies20191003Total.tsv as manualArticleAnalysis.tsv.

petermr commented 4 years ago

Thank you. Ambarish can you search Wikidata and see whether any of the articles** have Wikidata IDs?

P.

On Thu, Oct 3, 2019 at 11:08 AM Ambarish Kumar notifications@github.com wrote:

OK sir. I am about to process all articles and will update the sheet within 30 mins.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCSYEYEEFPMX3NWLRLWLQMXACTA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAHWLZA#issuecomment-537880036, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6ATIRARSOWCOZCQR3QMXACTANCNFSM4I3ENS4Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

Thank you, will do

On Thu, Oct 3, 2019 at 12:39 PM Ambarish Kumar notifications@github.com wrote:

Sir, please go through the activitytestforspecies20191003Total.tsv https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191003Total.tsv. It covers analysis of all articles of oil186.

We may remove all previous sheets and call activitytestforspecies20191003Total.tsv https://github.com/petermr/CEVOpen/blob/master/project/articleAnalysis/raw/Activitytestforspecies20191003Total.tsv as manualArticleAnalysis.tsv.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/19?email_source=notifications&email_token=AAFTCS4U3NEJJ6MQNUZVFYDQMXKXDA5CNFSM4I3ENS42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAH5LTI#issuecomment-537908685, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS2TOX54H5PZNBBWOYTQMXKXDANCNFSM4I3ENS4Q .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

@ambarishK If you see a Github comment above a table like:

We can make this file beautiful and searchable if this error is corrected: It looks like row 111 should actually have 9 columns, instead of 8. in line 110. 

it means the table is corrupt in some way. Please fix this.