Closed petermr closed 4 years ago
Many of the measured activities are against micro-organisms - bacteria, viruses, fungi, and by extension arthropods (insects) and parasites (helminths/worms, etc.) . This will be broad and m ight include herbicidal activities (but not laboratory animal strains),
These are likely to include the words "anti-X" where X is an organism (-bacterial, -fungal, etc.)
after (say) 30-50 articles request review
See compound
table for design
Sir, please review the target organism extraction sheet - targetOrganismSpecies20191218.tsv
There is analysis of first 20 articles of oil186.
Thank you. Please make sure there is a separate row for each table (as for compounds) and for sections.
On Tue, Dec 17, 2019 at 10:53 PM Ambarish Kumar notifications@github.com wrote:
Sir, please review the target organism extraction sheet - targetOrganismSpecies20191218.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/targetOrganismSpecies20191218.tsv
There is analysis of first 20 articles of oil186.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/70?email_source=notifications&email_token=AAFTCS57BP63MFZVXSS4IVLQZFKADA5CNFSM4J35JXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHEHPHY#issuecomment-566785951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZ6AI22GDWOZ557RHLQZFKADANCNFSM4J35JXJA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK sir.
I have analyzed the intermediate commit (with ca 100 rows) to extract the species. See https://github.com/petermr/CEVOpen/edit/master/articleAnalysis/oil186/raw/targetOrganismCount.csv @mannyrules for comment
@ambarishK please lookup species in Wikidata and add column for ID
OK sir.
Sir, I have added column for WIKIDATA ID. targetOrganismCount.csv
Also, include remaining target organisms. Row number 101 and onwards (to 180). targetOrganismSpecies20191218.tsv
Next step would be dictionary making.
Thank you, You can remove the two entries with missing Wikidata IDs "micro-organisms" ans "Robrardoterolla"
P.
On Fri, Dec 20, 2019 at 5:38 AM Ambarish Kumar notifications@github.com wrote:
Sir, I have added column for WIKIDATA ID. targetOrganismCount.csv https://github.com/petermr/CEVOpen/edit/master/articleAnalysis/oil186/raw/targetOrganismCount.csv
Also, include remaining target organisms. Row number 101 and onwards (to 180).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/70?email_source=notifications&email_token=AAFTCSZ4SBUGO33FR3OVQ7LQZRK4JA5CNFSM4J35JXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHL6BMA#issuecomment-567795888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYZVDOTUIHNM7IMYEDQZRK4JANCNFSM4J35JXJA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Sir, I have added remaining records of extracted target organism - 40 new records.
Updates are over copy of the target organism extraction sheet.
targetOrganismCountCopy.csv
Target organism dictionary file is targetOrganism20191222.xml
Please review these file.
I have analyzed the intermediate commit (with ca 100 rows) to extract the species. See https://github.com/petermr/CEVOpen/edit/master/articleAnalysis/oil186/raw/targetOrganismCount.csv @mannyrules for comment
Gentlemen, it seems my notifications weren't working, so I missed this.
@ambarishK bring me up to date regarding the following, please:
Sorry for the mixup.
Manny
Hi Manny.
Target organism extraction from oil186 is complete.
Extraction is done manually.
All articles are covered. I will revisit the extraction sheet as it contains 180 records while articles are 186. I have to verify that if any article is left or not.
https://github.com/petermr/CEVOpen/edit/master/articleAnalysis/oil186/raw/targetOrganismCount.csv - this sheet is extraction of target organisms from oil1000. There is occurrence frequency which is calculated by PMR and I have added the WD ID to target organisms.
I will drop you message as I verify that each article of oil186 is covered for target organism extraction.
I will be available after 4 PM IST.
Also, we have to get together on extracting other entities like techniques, activities etc.
As I revised the target organism extraction sheet for the coverage of all 186 articles, I find following missing articles.
PMC5307902 - No activity is discussed into the article.
PMC5524814 – No activity is discussed into the article.
PMC5597067 - Activity is discussed and target organisms are extracted from the section.
PMC5602841 – No activity is discussed as such against microorganisms.
PMC5694875 – No activity is discussed.
PMC5789316 - Activity is discussed and target organisms are extracted from the section.
PMC5858457 - Activity is discussed and target organisms are extracted from the section.
I just add those records and update the target organism extraction sheet.
Confirmation is required from PMR for the updation.
Sir, please review the target organism extraction sheet - oil1000TargetOrganismSpecies.tsv.
Please suggest for adding WD ID column for target organisms (format or template for WD ID column) as there are multiple entries into each cell of micro-organism column.
On Sat, Jan 4, 2020 at 6:16 PM Ambarish Kumar notifications@github.com wrote:
Sir, please review the target organism extraction sheet - oil1000TargetOrganismSpecies.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/oil1000TargetOrganismSpecies.tsv .
Please continue this to oil1000
Also update the table https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/targetOrganismCount.csv and add any new species
Please suggest for adding WD ID column for target organisms (format or template for WD ID column) as there are multiple entries into each cell of micro-organism column.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/70?email_source=notifications&email_token=AAFTCS4BOJSIUBDJAQMLEETQ4DHBRA5CNFSM4J35JXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC5GZA#issuecomment-570807140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYUX2MMMQHIHXU6RCTQ4DHBRANCNFSM4J35JXJA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK sir.
Concentrate first on finding all organisms from oil1000 and ad to https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/targetOrganismCount.csv
On Sat, Jan 4, 2020 at 6:16 PM Ambarish Kumar notifications@github.com wrote:
Sir, please review the target organism extraction sheet - oil1000TargetOrganismSpecies.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/oil1000TargetOrganismSpecies.tsv .
Please suggest for adding WD ID column for target organisms (format or template for WD ID column) as there are multiple entries into each cell of micro-organism column.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/70?email_source=notifications&email_token=AAFTCS4BOJSIUBDJAQMLEETQ4DHBRA5CNFSM4J35JXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIC5GZA#issuecomment-570807140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYUX2MMMQHIHXU6RCTQ4DHBRANCNFSM4J35JXJA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK sir.
Sir, I have extracted target organisms from 600 articles of oil1000. At the end of the day, I will complete extraction from 800 articles.
Compilation and processing of extraction sheet will require 3 passes - full name of abbreviation, frequency count and normalization.
I will upload the extraction sheet and add those records after completion.
Example of extracted records are as follows.
PMC5750594 C. albicans, C. neoformans, A. niger. C. neoformans
PMC5750605
PMC5750654
PMC5751248
PMC5761127
PMC5772139
PMC5778200
PMC5778779
PMC5788217 Bacillus cereus, Listeria monocytogenes, Micrococcus flavus, Staphylococcus aureus, Dickeya solani, Escherichia coli, Pectobacterium atrosepticum, Pectobacterium carotovorum subsp. carotovorum, Pseudomonas aeruginosa, Aspergillus flavus, A. ochraceus, A. niger, Candida albicans, Penicillium funiculosum, P. ochrochloron
PMC5789270
PMC5789316
PMC5794096
PMC5795983 Candida krusei, Candida albicans, Candida guilliermondii, Candida parapsilosis, Candida orthopsilosis, Candida metapsilosis, Cryptococcus neoformans, Paracoccidioides brasiliensis, Trichophyton mentagrophytes, Staphylococcus aureus, Escherichia coli, Pseudomonas aeruginosa
PMC5797122 Aedes aegypti, Anopheles quadrimaculatus, Anopheles albimanus
PMC5806308 Staphylococcus aureus, Bacillus subtilis, Pseudomonas aeruginosa, Candida albicans
PMC5807769 Candida albicans
PMC5811758
PMC5813356
PMC5822514
PMC5830750 Enterococcus faecalis, Staphylococcus aureus, Staphylococcus epidermidis, Proteus mirabilis, Escherichia coli, Pseudomonas aeruginosa
PMC5838999
PMC5842484
PMC5846372 Staphylococcus aureus, Staphylococcus epidermidis, Streptococcus mutans, Streptococcus viridans, Escherichia coli, Enterobacter cloacae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Candida albicans, C. tropicalis, C. glabrata
PMC5848570 Staphylococcus aureus, Enterococcus feacalis, Klebsiella pneumoniae, Salmonella paratyphi
PMC5849894
PMC5849899 A. flavus
PMC5849928 Salmonella typhimurium, Staphylococcus aureus, Escherichia coli
PMC5852288
PMC5855832
PMC5858069 Salmonella typhimurium, B. subtilis, S. epidermidis, S. mutans, C. albicans, Actinobacillus actinomycetemcomitans, E. faecalis, Serratia marcescens, S. aureus, M. luteus
PMC5858457 An. stephensi, Ae. aegypti, Cx. quinquefasciatus
PMC5859817 Staphylococcus aureus ATCC 6538 and Pseudomonas aeruginosa
PMC5867545 S. aureus, P. mirabilis, Streptococci spp., P. aeruginosa, E. coli, Salmonella, Klebsiella spp.
PMC5867556
Sir, please go through the extraction sheet for target organisms from oil1000 - oil1000TargetOrganismUnprocessed.csv
It is in unprocessed state right now.
Required processing steps are as follows.
I am processing all above steps.
Thank you
On Sat, 11 Jan 2020, 10:29 Ambarish Kumar, notifications@github.com wrote:
Sir, please go through the extraction sheet for target organisms from oil1000 - oil1000TargetOrganismUnprocessed.csv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/oil1000TargetOrganismUnprocessed.csv
It is in unprocessed state right now.
Required processing steps are as follows.
- Full name of abbreviation.
- Adding WDID.
- Frequency count
- Normalization.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/70?email_source=notifications&email_token=AAFTCS2TOHEAZRJ3QQ5Q32LQ5GNQ7A5CNFSM4J35JXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIV622Q#issuecomment-573304170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS57BDEAH4FSPCMIIDLQ5GNQ7ANCNFSM4J35JXJA .
Sir, please go through the extraction sheet for target organisms from oil1000 - oil1000TargetOrganismsUnique.tsv.
There are 593
unique records.
Please add frequency count of target organism in oil1000
.
There are some issues like many records mention genus name. For example -
Acetobacter
Achromobacter
Acidobacteria
Dictyoglomi
Firmicutes
lectularius
Should I go for removing those ones which has only genus name?
I am adding WIKIDATA ID for target organisms.
Thanks, Keep the genus name.
P.
On Mon, Jan 13, 2020 at 11:21 AM Ambarish Kumar notifications@github.com wrote:
Sir, please go through the extraction sheet for target organisms from oil1000 - oil1000TargetOrganismsUnique.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/oil1000TargetOrganismsUnique.tsv .
There are 593 unique records.
Please add frequency count of target organism in oil1000.
There are some issues like many records mention genus name. For example -
Acetobacter
Achromobacter
Acidobacteria
Dictyoglomi
Firmicutes
lectularius
Should I go for removing those ones which has only genus name?
I am adding WIKIDATA ID for target organisms.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/70?email_source=notifications&email_token=AAFTCS7FERLXGS4ANEIE6VDQ5RFE5A5CNFSM4J35JXJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIYLFSA#issuecomment-573616840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6WJLZVQPBNOLHCNLLQ5RFE5ANCNFSM4J35JXJA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK sir.
Which organisms are targets in the activity of EOs?