petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

Validate automatic extraction of compounds and percentages #57

Open petermr opened 4 years ago

petermr commented 4 years ago

AMI-TABLES extracts compounds and percentages

ami-table now uses a template file with regexes to extract compound and percentage columns from tables that contain both. The regex is fairly basic:

templates/phytomedchem.xml
<templateList>
    <template name="composition">
        <title find="
             [Cc]omposition
             [Oo]il
             EO
             [Pp]ercentage
             "
             exclude=""
             />
        <table regex=".*\/table_\d+\.xml"/>
        <column name="compound" find="
            [Cc]onstituent
            [Cc]ompound
            [Cc]omponent
            "/>
        <column name="percentage" find="
            [Pp]ercentage
            [Aa]rea
            %"
        />
    </template>
    <template name="activity">
        <title find="
        [Aa]ctivity
        target"
        />
        <table regex=".*/table_\d+\.xml"/>
        <column name="activity" find="activity"/>
        <column name="target" find="target"/>
    </template>
</templateList>

This currently has two templates (composition and a stub activity).

composition searches for

automatic extraction

ami-table used these regexes to extract composition tables. These are currently labelled in the form

/Users/pm286/projects/CEVOpen/searches/oil186/PMC4391421/sections/tables/subTable_2.html 

which should have been extracted from

/Users/pm286/projects/CEVOpen/searches/oil186/PMC4391421/sections/tables/table_2.xml 

Check that the tables correspond

report all comments in a "notes" column

ambarishK commented 4 years ago

Sir, please check for the updated sheet for composition table - composition20191028.tsv.

There are added columns for match_no, mismatch_no and mis_match_title.

It contains 25 added records for newly added columns - match_no, mismatch_no and ,mis_match_title

ambarishK commented 4 years ago

Sir, please go through the updated sheet for composition table - composition20191028.tsv.

column description

petermr commented 4 years ago

Thank you.

On Mon, Nov 11, 2019 at 8:58 AM Ambarish Kumar notifications@github.com wrote:

Sir, please go through the updated sheet for composition table - composition20191028.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/composition20191028.tsv .

column description

  • PMCID - Pubmed central ID.
  • table_no - Table number for table containing EO profile.
  • table_title - Table title for the table containing EO profile.
  • matched_no - subTable for the column of table containing EO profile (Constituents and percentage composition).
  • mismatch_no - subTable for columns of table not containing EO profile (reported as error).
  • mis_match_titles - table header for the subTable not corresponding to EO profile.
  • notes_for_subTable_Extraction - Added notes while extracting subTable.
  • serial - column number containing serial number for EO constituents.
  • compound - Column number for column containing EO constituents.
  • compound_title - Title for the column containing EO constituents.
  • percent - Column number for the percentage composition of the EO.
  • percent_title - Title for the column containing percentage composition of the EO.
  • notes_for_table_Extraction - Added notes for the extraction of table columns.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/57?email_source=notifications&email_token=AAFTCS7GGFQOLRHZDYINGP3QTENC3A5CNFSM4JK7M5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWDPWY#issuecomment-552351707, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4EYWJWATDWWP7A2ULQTENC3ANCNFSM4JK7M5JA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

automatically extracted tables

I now have the first pass of a tool to extract compounds and percentages.

extractor templates

This uses a template (in this case in https://github.com/petermr/CEVOpen/blob/master/templates/phytomedchem.xml) which currently reads as:

<templateList>
    <template name="composition">
        <title find="
             [Cc]hemical composition _OR
             [Mm]ajor constituents _OR
             [Cc]ompound composition _OR
             [Pp]hytochemical constituents _OR
             [Cc]omposition of _OR
             ([Mm]ain|[Mm]ajor) (components|constituents|compounds)
             "
             exclude=""
             />
        <table regex=".*\/table_\d+\.xml"/>
        <column name="compound" find="
            [Cc]onstituent _OR
            [Cc]ompound _OR
            [Cc]omponent
            "/>
        <column name="percentage" find="
            [Pp]ercentage _OR
            [Pp]eak [Aa]rea _OR
            %"
        />
    </template>
    <template name="activity">
        <title find="
        [Aa]ctivity
        target"
        />
        <table regex=".*/table_\d+\.xml"/>
        <column name="activity" find="activity"/>
        <column name="target" find="target"/>
    </template>
</templateList>

This has two templates, and we are using just the first (composition). It instructs the extractor to

(This may change as we increase the power of the regexes).

outputs

Every time one or more columns are matched a "subtable" is output with the columns found.

Typical outputs are in https://github.com/petermr/CEVOpen/blob/master/searches/oil186/PMC4391421/sections/tables/composition_extracted_1.html (The files https://github.com/petermr/CEVOpen/blob/master/searches/oil186/PMC4391421/sections/tables/composition1.html are probably deprecated and may dispappear in future). If successful this table should include two columns corresponding to the columnNames in the extraction template ("composition" and "percentage")

VALIDATION

@ambarishK PLEASE VALIDATE and annotate the composition_extracted_1 files:

petermr commented 4 years ago

chemical multiset

The chemical names have been extracted to an aggregate multiset (frequencies) in https://github.com/petermr/CEVOpen/blob/master/searches/oil186/__tables/compound_multiset.txt This lists compounds in order or decreasing frequency.

problems

ambarishK commented 4 years ago

Sir, suggest for any changes. EO composition extraction and manual analysis - compositionTableExtraction.tsv.

Column description.

petermr commented 4 years ago

thank you = go ahead with the rest

On Tue, Nov 12, 2019 at 12:57 PM Ambarish Kumar notifications@github.com wrote:

Sir, suggest for any changes. EO composition extraction and manual analysis - compositionTableExtraction.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/compositionTableExtraction.tsv .

Column description.

  • PMCID - PubMed central id.
  • table_no - EO table number.
  • table_title - Title for EO table.
  • Extracted_columns - section containing EO profile columns.
  • Extracted_column_title_1_constituents - title for extracted constituents.
  • Extracted_column_title_2_composition - title for extracted percentage composition.
  • composition_extracted - Mapped EO profile column titles.
  • composition_extracted_compound - symbolic title for mapped EO constituents column.
  • composition_extracted_percentage - symbolic title for mapped EO percent composition.
  • notes - Added notes for extracted EO composition.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/57?email_source=notifications&email_token=AAFTCS2NTNMBTQZ7BDY4WV3QTKR4XA5CNFSM4JK7M5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED2FAZY#issuecomment-552882279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3YN43QEUYQTFGAWV3QTKR4XANCNFSM4JK7M5JA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 4 years ago

Sir, please go through the updated sheet for EO composition extraction sheet - compositionTableExtraction.tsv.

It contains added columns for not extracted constituents and percentage composition columns.

petermr commented 4 years ago

design for reporting Retrieval/Extraction in tables.

Gold standard

This is a human-created list of tables and columns which ideally should be retrieved precisely by the software

ami-table operations.

Note that several tables contain more than one percentage column and the identification should identify the exact columns.

Compound Thyme% Basil%
Thymol . 90 10
Ocimene 15 70

is actuallly two tables:

Compound Thyme%
Thymol 90
Ocimene 15

and

Compound Basil%
Thymol 10
Ocimene 70

This should be reported as two 2-column tables

table retrieval

ami-table carries out the following operations which should be measured:

for each table record ONE of these codes. For FN or FP record the tables which cause the fail.

column retrieval

ami-table carries out the following operations which should be measured:

column extraction

This will be done later.

petermr commented 4 years ago

analysis table

For analysing the retrieved/extracted tables we need to following column headings:

ambarishK commented 4 years ago

Sir, please go through the new composition analysis sheet - compositionAnalysis20191119.tsv.

Number of added records are - 30.

ambarishK commented 4 years ago

Sir, please go through the updated sheet for composition extraction - compositionAnalysis20191119.tsv.

It contains 140 added records and additional columns for FPs and FNs. Later on we may drop FPs and FNs columns.

ambarishK commented 4 years ago

Sir, please go through the new composition extraction sheet - compositionAnalysis20191119.tsv

ambarishK commented 4 years ago

Sir, please go through the composition extraction sheet - CompositionAnalysis20191119.tsv.

This is complete sheet for composition extraction for oil186.

TN = 72. FN = 02.

Graphics mode table count is - 11.

Count for FN compound_col_name is - 45.

Count for FN percent_col_name is - 54.

Count of articles containing two EO tables - 08.

Missing columns ( either compound column OR percent composition column ) are denoted as false negative ( FN ).


Column description is as follows.

ambarishK commented 4 years ago

Sir, for many articles, inspite of regular title for compound and percent composition columns, EO composition is not extracted.

For example.


PMC5575638 | Table 3 | table_3.xml | Essential oil compositions of wild garlic ( Allium vineale) ... | FN |   | FN | FN |   | No mapping of columns are there. Two EO tables are there. | Compound ; Percent Composition

PMC5788217 | Table 1 | table_1.xml | Chemical composition of ZEO. | FN |   | FN | FN |   | EO composition is not extracted. | Compounds ; Area (%)

PMC5794096 | Table 1 | table_1.xml | Volatile compounds identified by gas chromatography followed ... | FN |   | FN | FN |   | No EO composition is extracted. | Compound ; Area (%)

PMC5849928 | Table 1 | table_1.xml | Chemical compounds of anise essential oil (AEO) | FN |   | FN | FN |   | No EO composition columns are extracted. | Compound name ; Percent

Updated sheet for composition analysis is - compositionAnalysis20191119.tsv

petermr commented 4 years ago

I am adjusting the parameters to extract tables and columns. Here is the current output which you can use to select true and false positives. Please add a new column ("matches20191121" before graphic_table) to the latest compositionAnalysisindicating which of the extracted composition matches are False positives and also indicating where there are false negatives (tables not extracted). Add only "FN" or "FP" so I can quickly find the errors @ambarishK

petermr commented 4 years ago

matches 20191121

AMITableTool cTree: PMC4391421
table:  Chemical composition of thyme EO 
column: compound => Constituents*; 64.7
column: percentage => Area % of total; 100.0
AMITableTool cTree: PMC5080681
table:  Chemical composition, concentrations (%) and calculated retention indices, of T. bovei essential oil as characterized by GC/MS analysis 
column: compound => Constituents; 97.1
column: percentage => %; 100.0
AMITableTool cTree: PMC5132230
table:  Chemical composition of the Aeollanthus suaveolens essential oil. 
column: compound => Compounds; 91.7
column: percentage => Relative Percentage (%); 100.0
AMITableTool cTree: PMC5203915
table:  Percentage of composition of essential oils from Rhaponticum carthamoides roots of soil-grown plants (SGR) and hairy roots (HR). 
column: compound => Constituent; 92.6
column: compound => Class of compound; 88.2
column: percentage => SGR [%]; 100.0
column: percentage => HR [%]; 92.6
AMITableTool cTree: PMC5237462
table:  Major constituents of the essential oils of M. piperita. 
column: percentage => Peak Area (%); 100.0
AMITableTool cTree: PMC5248495
table:  Chemical composition of essential oils of Ocimum basilicum var .purpureum, Ocium basilicum var . thyrsiflora, Ocimum citriodorum 
column: percentage => O. basilicumvar.purpureum,%b; 100.0
column: percentage => O. basilicumvar.thyrsiflora,%; 100.0
column: percentage => O. xcitriodorum,%; 100.0
AMITableTool cTree: PMC5282690
AMITableTool cTree: PMC5307246
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5282690/sections/tables
AMITableTool cTree: PMC5307902
AMITableTool cTree: PMC5324201
table:  Compound composition (% w/w) in the essential oil and water extract part of Anethum sowa L. root 
column: compound => Name of Compounds; 96.0
table:  Proximate composition of Anethum sowa L. Root 
column: percentage => Percent (%) Composition; 100.0
table:  Fatty acid composition of Anethum sowa L. root extract (cold and hot extracts) by GC 
column: compound => Fatty acid compounds; 73.3
AMITableTool cTree: PMC5330108
AMITableTool cTree: PMC5344628
table:  The composition of the selected commercial essential oils from the Fares Company, Romania, compared with AFNOR/ISO standards *. 
column: compound => Chemical Compound; 96.3
AMITableTool cTree: PMC5364420
table:  Chemical composition of essential oil from C. rotundus rhizomes. 
column: compound => Compounds; 98.3
column: percentage => Percentage (%); 100.0
AMITableTool cTree: PMC5393100
AMITableTool cTree: PMC5397855
table:  Chemical composition of the garlic essential oil. 
column: compound => Compound; 92.9
AMITableTool cTree: PMC5411863
table:  Percentage composition of the essential oil of the fruits of K.anatolica Hub.-Mor. 
table:  The main components of essential oil of K. anatolica Hub.-Mor. from four altitudes 
AMITableTool cTree: PMC5412227
table:  Comparison of the main components in D. kotschyi essential oils from other studies and this study 
column: compound => Major constituents; 50.0
table:  Chemical composition of the essential oils from the aerial parts of D. kotschyi. 
column: compound => Compounds; 97.9
column: percentage => Relative content (%); 100.0
AMITableTool cTree: PMC5423258
table:  Chemical composition of essential oils of C. decurrens, C. sempervirens and T. articulat a aerial parts 
column: compound => Compound; 81.3
column: percentage => Percentage (%); 100.0
AMITableTool cTree: PMC5426739
AMITableTool cTree: PMC5427463
AMITableTool cTree: PMC5448358
table:  Chemical composition of the essential oil of the aerial part and the roots of Elyonurus hensii (site Loufoulakari) 
column: percentage => Content (%); 54.8
AMITableTool cTree: PMC5454990
table:  Chemical composition of essential oils of Ocotea species. RI Calc, calculated retention index; RI Lit, literature retention index. 
column: compound => Constituents; 100.0
AMITableTool cTree: PMC5485486
table:  Main major compounds of essential oil of Aloysia citriodora analyzed by GC- MS. 
column: percentage => % AGa; 100.0
column: percentage => % BMb; 100.0
column: percentage => % BEc; 100.0
column: percentage => % DEd; 100.0
column: percentage => % MAe; 100.0
table:  Chemical composition after GC-MS analysis of the essential oils of A. citriodora. 
column: percentage => % AGc; 100.0
column: percentage => % BMd; 100.0
column: percentage => % BEe; 100.0
column: percentage => % DEf; 100.0
column: percentage => % MAg; 100.0
AMITableTool cTree: PMC5486035
table:  Chemical composition of Pistacia vera L. variety Bronte hull essential oil. 
column: compound => Compound; 91.3
column: percentage => Areab(%); 100.0
AMITableTool cTree: PMC5497343
table:  Chemical composition of the essential oils of T. minuta flower 
column: compound => Chemical constituents; 76.0
column: percentage => % Composition; 100.0
AMITableTool cTree: PMC5507808
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5507808/sections/tables
AMITableTool cTree: PMC5524814
table:  Composition of the essential oils of Salvia blancoana subsp. mariolensis, S. x hegelmaieri and S. officinalis subsp. lavandulifolia from the Valencia region (Spain) and close areas a . 
column: compound => Constituentsc; 65.2
column: percentage => Percentage in the essential oilsd; 100.0
AMITableTool cTree: PMC5527698
AMITableTool cTree: PMC5535876
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5535876/sections/tables
AMITableTool cTree: PMC5543433
table:  Chemical composition of essential oil isolated by hydrodistillation from flowers of ClEO 
column: percentage => d%; 100.0
AMITableTool cTree: PMC5546729
table:  The essential oil composition of Iranian and Indian Nigella sativa L. seed oils identified using GC-MS 
column: compound => Compound; 93.8
column: percentage => SFE, %; 56.3
column: percentage => Hex, %; 50.0
column: percentage => Met, %; 62.5
column: percentage => Hex/Met, %; 43.8
AMITableTool cTree: PMC5551175
table:  Chemical composition of the essential oils from Azadirachta indica, Aframomum melegueta, Aframomum daniellii, Clausena anisata, Dichrostachys cinerea, and Echinops giganteus. 
column: percentage => %d; 100.0
AMITableTool cTree: PMC5568258
AMITableTool cTree: PMC5569441
table:  Chemical composition of cinnamon in the essential oil of cinnamon bark 
column: compound => Compound; 100.0
column: percentage => Concentration (%); 100.0
AMITableTool cTree: PMC5575638
AMITableTool cTree: PMC5577677
AMITableTool cTree: PMC5585972
table:  Chemical composition of clove essential oil 
column: compound => Compound; 80.0
column: percentage => % composition; 100.0
table:  Chemical composition of the cinnamon essential oil 
column: compound => Compound; 100.0
column: percentage => % composition; 100.0
AMITableTool cTree: PMC5590060
table:  Composition of E. foetidum essential oils. 
AMITableTool cTree: PMC5590062
table:  Composition of the essential oils from the leaves of A. schaueriana collected at Estação Ecológica Jureia-Itatins ( 1) and Parque Estadual da Ilha do Cardoso ( 2). 
column: compound => Compounds; 75.6
column: percentage => %; 100.0
AMITableTool cTree: PMC5590063
AMITableTool cTree: PMC5590065
table:  Percentage composition of classes of compounds in A. danielli essential oils. 
column: compound => Compound Class; 100.0
column: percentage => Leaf (%); 100.0
column: percentage => Stem (%); 100.0
column: percentage => Seed (%); 100.0
column: percentage => Rhizome (%); 100.0
column: percentage => Pod (%); 100.0
table:  Composition of A. danielli essential oils. 
column: compound => Compound; 100.0
column: percentage => Leaf (%); 100.0
column: percentage => Stem (%); 100.0
column: percentage => Seed (%); 100.0
column: percentage => Rhizome (%); 100.0
column: percentage => Pod (%); 100.0
column: percentage => QI (%); 100.0
AMITableTool cTree: PMC5590066
table:  Chemical composition of EO of wild Achillea millefolium L. (aerial parts) from Toulouse region, France. 
column: compound => Compounds; 100.0
column: percentage => (%)c; 100.0
AMITableTool cTree: PMC5590067
table:  Chemical composition of the essential oil of Z. monogynum. 
column: compound => Compound; 100.0
column: percentage => %; 100.0
AMITableTool cTree: PMC5590070
table:  Anti-inflammation, antioxidant, antibacterial, and cytotoxic activities of T. vulgare essential oil and its main constituents 
column: compound => Compounds; 100.0
table:  Chemical composition of T. vulgare essential oil from northern Quebec, Canada. 
column: compound => Identified Compounds; 93.2
column: percentage => Relative CONCENTRATION (%); 94.9
AMITableTool cTree: PMC5592951
AMITableTool cTree: PMC5597067
AMITableTool cTree: PMC5602041
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5602041/sections/tables
AMITableTool cTree: PMC5602841
AMITableTool cTree: PMC5603114
table:  Chemical composition of resin essential oil of P. heptaphyllum, EOPh. 
column: compound => Constituents; 100.0
column: percentage => Area (%) EOPh  Com. resins; 87.0
column: percentage => Area (%) EOPh  Nat. resins; 65.2
AMITableTool cTree: PMC5613177
AMITableTool cTree: PMC5615139
table:  Chemical Composition of essential oil of fresh rhizome of Curcuma longa L. 
column: compound => Compounds; 100.0
column: percentage => % Area; 100.0
AMITableTool cTree: PMC5615285
table:  Chemical composition of the essential oil of Foeniculum vulgare according to a GLC-MS analysis. 
column: compound => Compounds; 83.8
column: percentage => % *; 100.0
AMITableTool cTree: PMC5620597
table:  Chemical composition of the essential oils from O. basilicum L. varieties, violetto, latifolia, minimum and lettuce, cultivated in the greenhouse conditions. 
column: compound => Compound; 100.0
table:  Chemical composition of the essential oils from O. basilicum L. varieties latifolia, minimum, lettuce and cinnamon, cultivated in the field conditions. 
column: compound => Compound; 99.0
AMITableTool cTree: PMC5622382
table:  Chemical compositions of leaf essential oil of Salvia officinalis from three different global locations. 
column: compound => Compound; 100.0
AMITableTool cTree: PMC5622390
table:  Chemical composition of the active antifungal fractions of H. brasiliense female flowers and leaves essential oils isolated by bioautography-guided TLC. 
column: compound => Compound; 96.0
column: percentage => FwF (%); 96.0
column: percentage => LeF (%); 96.0
table:  Chemical composition of Hedyosmum brasiliense essential oils. 
column: compound => Compound; 84.7
column: percentage => Relative Amount (%); 100.0
AMITableTool cTree: PMC5622397
AMITableTool cTree: PMC5622398
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5622398/sections/tables
AMITableTool cTree: PMC5622401
table:  Chemical compositions of the essential oils of wild and commercial Commiphora gileadensis. 
column: compound => Compounds; 99.2
AMITableTool cTree: PMC5622403
table:  A literature report from 2012–2017 on chemical composition of Artemisia oils from different geographical regions (Plant part: AP: aerial parts; F: flowers; FH: flower-heads; L: leaves; B: Buds) 
column: percentage => Major Components (%); 100.0
AMITableTool cTree: PMC5625792
table:  Chemical composition of ROEO 
column: compound => Compounda; 97.3
column: percentage => Area percentage (%)*; 100.0
AMITableTool cTree: PMC5641611
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5641611/sections/tables
AMITableTool cTree: PMC5651092
table:  Percentage composition of Citrus limonum and Piper nigrum oils. 
column: compound => Constituents; 9.1
column: percentage => Citrus limonum(%); 92.4
column: percentage => Piper nigrum(%); 92.4
AMITableTool cTree: PMC5653886
table:  Chemical compositions of D. moldavica determined by gas chromatography-mass spectrometry analysis 
column: percentage => (%); 100.0
table:  Chemical compositions of M. officinalis by gas chromatography-mass spectrometry analysis 
column: percentage => (%); 100.0
AMITableTool cTree: PMC5660901
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5660901/sections/tables
AMITableTool cTree: PMC5661929
table:  Chemical composition (GC–MS) of Pittosporum tobira seed obtained by HD and HS-SPME 
column: compound => Compounds*; 86.2
column: percentage => % RC; 69.0
table:  Proximate composition, mineral content and phytochemical composition of Pittosporum tobira seeds 
AMITableTool cTree: PMC5668225
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5668225/sections/tables
AMITableTool cTree: PMC5669080
AMITableTool cTree: PMC5669111
AMITableTool cTree: PMC5674267
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5674267/sections/tables
AMITableTool cTree: PMC5684592
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5684592/sections/tables
AMITableTool cTree: PMC5694497
table:  Major components (%) identified in the essential oil extracted from leaves of Vitex capitata, V. megapotamica, V. gardneriana and V. rufescens from northeastern Brazil a. 
column: compound => Compounds; 100.0
AMITableTool cTree: PMC5694587
table:  Chemical composition of some common Mediterranean plant essential oils. 
column: compound => Some of mainSingle constituents; 98.1
column: percentage => Percentagec ; 100.0
AMITableTool cTree: PMC5694611
table:  Chemical composition, concentrations (%), and calculated retention indices of R. chalepensis essential oils as characterized by GC-MS analysis. 
column: compound => Compound; 83.3
column: percentage => % of total essential oil from Hebron; 96.7
column: percentage => % of total essential oil from Jerusalem; 96.7
column: percentage => % of total essential oil from Jenin; 90.0
AMITableTool cTree: PMC5694875
table:  Effects of heat processing on the amino acid composition of conophor nut protein isolates (g/100 g) 
table:  Proximate composition of heat processed defatted conophor nut flours and protein isolates (g/100 g dry weight basis) 
AMITableTool cTree: PMC5694991
table:  Chemical composition of the essential oil of two studied Iranian superior cumin landraces. 
column: compound => Compounds%; 71.2
column: percentage => Compounds%; 45.8
AMITableTool cTree: PMC5699893
AMITableTool cTree: PMC5702407
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5699893/sections/tables
AMITableTool cTree: PMC5702920
table:  Relative (%) chemical composition of the collected air after 1 and 15 minutes of nebulization of each formulation. 
column: compound => Formulations according to major compounds; 100.0
table:  Sample codes, identified species for each collected sample [ 5], formulation's composition, and major compounds in the essential oils of each formulation. 
column: compound => Major compounds; 35.7
AMITableTool cTree: PMC5717781
AMITableTool cTree: PMC5723952
AMITableTool cTree: PMC5725564
AMITableTool cTree: PMC5735349
AMITableTool cTree: PMC5736702
AMITableTool cTree: PMC5742650
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5742650/sections/tables
AMITableTool cTree: PMC5745743
AMITableTool cTree: PMC5746745
table:  Main components of spike lavender leaf essential oil from lines HMGR5 and WT extracted with chloroform-d and determined by GC/MS. The percentage is referred to the area of the 15 main peaks of each sample. Rt: retention time; SD: standard deviation; h: hours. 
AMITableTool cTree: PMC5747963
AMITableTool cTree: PMC5748641
table:  Chemical composition of Niphogeton dissecta essential oil of province of Loja, Ecuador. 
column: percentage => %c; 77.0
AMITableTool cTree: PMC5750594
table:  Chemical composition of the oleoresin essential oil of Protium amazonicum from Ecuador. 
column: compound => Compound; 100.0
column: percentage => %; 100.0
AMITableTool cTree: PMC5750605
AMITableTool cTree: PMC5750654
AMITableTool cTree: PMC5751248
AMITableTool cTree: PMC5761127
table:  Chemical composition (%) of leaves essential oil from Tunisian M.piperita as identified by GC/MS analysis 
column: compound => Compounds; 93.3
column: percentage => Percentage (%); 100.0
AMITableTool cTree: PMC5772139
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5772139/sections/tables
AMITableTool cTree: PMC5778200
table:  The influence of different pretreatments on the chemical composition of black cumin seed oil 
column: compound => Compound; 50.0
AMITableTool cTree: PMC5778779
AMITableTool cTree: PMC5788217
table:  Chemical composition of ZEO. 
AMITableTool cTree: PMC5789270
AMITableTool cTree: PMC5789316
AMITableTool cTree: PMC5794096
AMITableTool cTree: PMC5795983
table:  Chemical composition of the essential oil of the leaves from Blepharocalyx salicifolius. 
column: compound => Compound; 97.6
column: percentage => %; 100.0
AMITableTool cTree: PMC5797122
table:  The main components of N. cataria species depends on agricultural practices, soil, age of the plant, collection period, drying, extraction methods, climate and geographic origin (NPL – nepetalactone). 
column: compound => Compound 1; 0.0
column: compound => Compound 2; 0.0
column: compound => Compound 3; 0.0
AMITableTool cTree: PMC5806308
AMITableTool cTree: PMC5807769
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5807769/sections/tables
AMITableTool cTree: PMC5811758
AMITableTool cTree: PMC5813356
table:  Chemical composition of the essential oil of Haplophyllum tuberculatum vegetal parts 
AMITableTool cTree: PMC5822514
AMITableTool cTree: PMC5830750
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5830750/sections/tables
AMITableTool cTree: PMC5838999
table:  Composition of the essential oil of R. acaule 
column: compound => Constituentsa; 100.0
column: percentage => (%)c; 100.0
AMITableTool cTree: PMC5842484
AMITableTool cTree: PMC5846372
AMITableTool cTree: PMC5848570
AMITableTool cTree: PMC5849894
table:  Changes in proximate composition of walnut seeds during processing 
column: percentage => Moisture%; 100.0
column: percentage => Ash%; 100.0
column: percentage => Lipid%; 100.0
column: percentage => Protein%; 100.0
column: percentage => Carbohydrate%; 100.0
table:  Changes in mineral composition of walnut during processing 
AMITableTool cTree: PMC5849899
AMITableTool cTree: PMC5849928
AMITableTool cTree: PMC5852288
AMITableTool cTree: PMC5855832
AMITableTool cTree: PMC5858069
AMITableTool cTree: PMC5858457
AMITableTool cTree: PMC5859817
AMITableTool cTree: PMC5867545
AMITableTool cTree: PMC5867556
table:  Molecular composition of the various propolis preparations determined by high-performance liquid chromatography-UV-electrospray ionization mass (HPLC-UV-ESI-MS). 
AMITableTool cTree: PMC5871051
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5871051/sections/tables
AMITableTool cTree: PMC5871294
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5871294/sections/tables
AMITableTool cTree: PMC5872285
table:  Fatty acid composition of L. migratoria. 
column: percentage => Mean ± SD (%); 100.0
table:  Proximate composition of L. migratoria (% based on dry weight). 
column: percentage => Mean (%) ± SD; 100.0
AMITableTool cTree: PMC5872290
table:  Oviposition deterrent activity of P. anisum, L. berlandieri and C.aurantifolia essential oils and their major components against gravid female Culex quinquefasciatus. 
column: percentage => Effective repellency (%); 100.0
table:  LC 50 and LC 90 (µg/mL) of the major constituents of the essential oils at III instar and pupal of Culex quinquefasciatus after 24 h of exposure. 
column: compound => Compounds; 100.0
AMITableTool cTree: PMC5874608
table:  Chemical composition of essential oils of Conradina species. 
column: percentage => Major Oil Components (%); 87.5
AMITableTool cTree: PMC5876267
table:  Chemical composition of the Origanum compactum essential oil tested against Anisakis simplex L3 larvae 
AMITableTool cTree: PMC5876298
AMITableTool cTree: PMC5877547
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5877547/sections/tables
AMITableTool cTree: PMC5878871
table:  Chemical composition of essential oil of LEO. 
column: compound => Compounds; 100.0
column: percentage => % RAb; 100.0
AMITableTool cTree: PMC5879832
AMITableTool cTree: PMC5884000
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5884000/sections/tables
AMITableTool cTree: PMC5884125
AMITableTool cTree: PMC5885327
table:  Chemical composition of P . roseum essential oil. 
column: compound => Compounds; 100.0
column: compound => % Constituents ofP. roseumEO; 100.0
column: percentage => % Constituents ofP. roseumEO; 100.0
AMITableTool cTree: PMC5886561
AMITableTool cTree: PMC5896384
AMITableTool cTree: PMC5896386
table:  Chemical composition of the volatile oil of T.minuta 
AMITableTool cTree: PMC5897738
table:  Chemical composition of RC essential oil. 
column: compound => Compoundsa; 93.1
column: percentage => % in samples; 82.8
AMITableTool cTree: PMC5901951
AMITableTool cTree: PMC5902937
table:  Policosanol composition of milk thistle oil extracted from mature seeds 
AMITableTool cTree: PMC5905184
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5905184/sections/tables
AMITableTool cTree: PMC5905380
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5905380/sections/tables
AMITableTool cTree: PMC5905578
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5905578/sections/tables
AMITableTool cTree: PMC5909600
table:  Chemical composition and fatty acid concentrations in breast meat in broiler chickens fed diets without fat supplementation or supplemented with rapeseed oil, lard, or palm oil. 
AMITableTool cTree: PMC5918559
AMITableTool cTree: PMC5919639
AMITableTool cTree: PMC5920421
AMITableTool cTree: PMC5920425
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5920421/sections/tables
table:  Major and trace elements composition of Terminalia ferdinandiana kernels (mg/100 g DW). 
table:  Proximate composition of Terminalia ferdinandiana kernels. 
AMITableTool cTree: PMC5921405
AMITableTool cTree: PMC5923693
AMITableTool cTree: PMC5925846
table:  Effects of capsicum oleoresin, carvacrol, cinnamaldehyde and their mixtures added to broilers’ mixed feed on the composition of fatty acids in the breast meat (x ± SEM) 
table:  Effects of capsicum oleoresin, carvacrol, cinnamaldehyde and their mixtures added to broilers’ mixed feed on the composition of fatty acids in leg meat (x ± SEM) 
table:  Ingredients and chemical composition of experimental diets (as-fed basis) 
column: percentage => Feed stuff, %; 16.7
AMITableTool cTree: PMC5933010
table:  Chemical composition of MPEO by GC/MS. 
column: compound => Compound; 100.0
column: percentage => Relative %; 100.0
AMITableTool cTree: PMC5933022
AMITableTool cTree: PMC5933509
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5933509/sections/tables
AMITableTool cTree: PMC5933692
table:  Essential oil composition of G. rosmarinifolia. Compounds belonging to the same chemical class are arranged according to Linear Retention Indices (LRI) of the HP-5MS column. 
column: compound => Compound; 98.0
column: percentage => Relative amount (%); 100.0
AMITableTool cTree: PMC5937097
table:  Chemical composition of the essential oil of Piper tuberculatum Jacq 
column: percentage => (%); 100.0
AMITableTool cTree: PMC5937106
table:  Comparative percentage composition of the stem, leaf and root oils of Teucrium polium 
column: compound => Compoundsa; 98.5
column: percentage => StemOil(%); 100.0
column: percentage => LeafOil (%); 100.0
column: percentage => FlowerOil(%); 100.0
table:  Percentage composition of the leaf oil of Ajuga chamaecistus 
column: compound => Compoundsa; 98.7
column: percentage => (%); 98.7
table:  Comparative percentage composition of the stem, leaf and flower oils of Phlomis aucheri 
column: compound => Compoundsa; 98.7
column: percentage => Stem Oil (%); 100.0
column: percentage => Leaf Oil (%); 100.0
column: percentage => Flower Oil (%); 100.0
AMITableTool cTree: PMC5938542
AMITableTool cTree: PMC5940754
table:  Chemical composition of S. guianensis essentials oil samples extracted from plants from different localities of the Gurupi (i.e., Gurupi 1 and Gurupi 2) and Formoso do Araguaia counties (Tocantins State, Central Brazil). 
AMITableTool cTree: PMC5945564
AMITableTool cTree: PMC5946457
AMITableTool cTree: PMC5947909
AMITableTool cTree: PMC5952513
AMITableTool cTree: PMC5954905
AMITableTool cTree: PMC5956054
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5945564/sections/tables
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5946457/sections/tables
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5947909/sections/tables
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5952513/sections/tables
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5954905/sections/tables
AMITableTool cTree: PMC5957362
AMITableTool cTree: PMC5958151
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5958151/sections/tables
AMITableTool cTree: PMC5958191
AMITableTool cTree: PMC5960541
AMITableTool cTree: PMC5960548
table:  Chemical composition of Melaleuca alternifolia essential oil. 
column: percentage => Composition%; 100.0
AMITableTool cTree: PMC5961776
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5961776/sections/tables
AMITableTool cTree: PMC5963643
table:  Chemical composition of the essential oil from O. majorana 
column: percentage => Composition (%); 100.0
AMITableTool cTree: PMC5964621
AMITableTool cTree: PMC5970210
table:  Acaricidal activities of major components of the essential oils of M. officinalis cultivated in France, Ireland, and Serbia against D. farinae and D. pteronyssinus using a contact + fumigant toxicity bioassay. 
column: compound => Compounds; 100.0
column: percentage => LD50(μg/cm2) (95% CL)a; 100.0
column: percentage => LD90(μg/cm2) (95% CL)a; 100.0
AMITableTool cTree: PMC5974043
table:  Percentage and proximate composition of the experimental diets containing supplement of different PSO rate. 
AMITableTool cTree: PMC5977410
table:  Chemical composition of Pistacia lentiscus essential oil. 
column: compound => Compounds; 94.7
column: percentage => Percentage (%); 100.0
AMITableTool cTree: PMC5978029
AMITableTool cTree: PMC5985564
table dir does not exist: /Users/pm286/projects/CEVOpen/searches/oil186/PMC5985564/sections/tables
AMITableTool cTree: PMC5993771
table:  Main compounds of the R. officinalis essential oil identified by GC/MS. 
column: compound => Compound; 96.0
column: percentage => %; 100.0
AMITableTool cTree: PMC5997812
AMITableTool cTree: PMC6006875
AMITableTool cTree: PMC6011056
table:  Chemical composition, retention index experimental (RIExp), retention index of the literature (RILit), and percentage of the identified components (%) from the essential oils of C. zeylanicum (EOCz) and C. cassia (EOCc) stems. 
column: compound => Compounds; 97.7
AMITableTool cTree: PMC6011059
AMITableTool cTree: PMC6011244
table:  Composition of the experimental diets (%) 
AMITableTool cTree: PMC6015887
table:  Chemical compositions of CEO. 
column: percentage => PA (%); 100.0
ambarishK commented 4 years ago

Sir, please go through the updated composition analysis sheet - compositionAnalysis20191119.tsv

Added columns -

Also, suggest changes into it.

petermr commented 4 years ago

NO There should be a single column matches20191121 which records whether a table has been extracted (NOT what the columns are). You should be able to add it after extracted_subtable_name

ambarishK commented 4 years ago

Sir, please check for updations - compositionAnalysis20191119.tsv.

Added column for match20191121.

petermr commented 4 years ago

check extracted compounds against current compounds dictionary

Have extracted compound names from "most" of the relevant tables in oil186 . This gives a multiset of names/frequencies in https://github.com/petermr/CEVOpen/blob/master/searches/oil186/__tables/compound_multiset.txt This has ca 1730 lines, but some are garbled (e.g.

1,8-cineole (20.1), α-thujone (25.1), β-thujone (22.9), camphor (10.5)

where percentages are conflated with the name. There are also misprints, extraneous spaces, etc.

I have edited this to a file containing (fairly) clean . names: https://github.com/petermr/CEVOpen/blob/master/searches/oil186/__tables/compound_set.txt which has ca 1297 names

These were then used to search the current "compounds.xml" dictionary https://github.com/petermr/CEVOpen/blob/master/dictionary/compound/compound.xml

The search results are in https://github.com/petermr/CEVOpen/blob/master/searches/oil186/__tables/foundNotFound.txt with about 1000 terms not found and about 300 found (in compound.xml).

The failures are probably due to:

examples of failures

Cannot find term in dictionary (+)-cedrol
Cannot find term in dictionary (+)-curcuphenol
Cannot find term in dictionary (+)-fenchol
Cannot find term in dictionary (+)-α-terpineol
Cannot find term in dictionary (+/-)-norephedrine
Cannot find term in dictionary (-)-a-santalal
Cannot find term in dictionary (-)-borneol
Cannot find term in dictionary (-)-camphene
Cannot find term in dictionary (-)-camphor
Cannot find term in dictionary δ‐cadinene
Cannot find term in dictionary λ-gurjunene
Cannot find term in dictionary ρ-cymene
Cannot find term in dictionary ρ-cymenea,b
Cannot find term in dictionary τ-cadinol
Cannot find term in dictionary τ-muurolol

successes

found: (-)-caryophyllene oxide
found: (-)-limonene
found: (e)-2-hexenal
found: (e)-2-nonenal
found: (e)-2-octenal
found: (e)-cinnamaldehyde