petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
27 stars 19 forks source link

Composition table list #44

Open petermr opened 5 years ago

petermr commented 5 years ago

Manually create a list of all "chemical composition" tables of essential oils. There should normally be one per paper (often but not always Table 1). The title will normally be sufficient to decide whether it's the correct table. (However in a few cases there might be more than one candidate table). The archetype (template) is given in https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/composition20191028.tsv

PMCID   table_no    table_title     notes
PMC4391421  Table 1     Chemical composition of thyme EO    [notes]
PMC5080681  Table 1     Chemical composition, concentrations (%) and calculated rete ...    [notes]

This table can be expanded to the full 186 entries.

For NO table enter NONE in cols 2,3, If multiple tables or unclear enter MULTIPLE in cols 2,3 and list the possible tables and title in col 4

ambarishK commented 5 years ago

Sir, please go through the composition table - composition20191028.tsv

185 articles analysed.

ambarishK commented 5 years ago

Sir, please go through the updated and revised table for EO composition.

composition20191028.tsv

Also, go through for the notes column. I have added notes about extraction.

Sheet contains extraction of 185 articles.

petermr commented 5 years ago

IDENTIFY columns for component and its percentage occurrence

Each row in Composition tables should contain (a) name of compound (b) percentage occurrence (c) serial number (normally number of row) . (Other columns, e.g. Retention indices and time can be ignored).

task identify name and percentage by column number.

taking https://github.com/petermr/CEVOpen/blob/master/searches/oil186/__tables/summary.html we gat:

<tr>
  <td>PMC4391421</td>      
  <td>Table 1</td>
  <td>Chemical composition of thyme EO</td>
  <td>No.</td>                    // 1st   row
  <td>RT (min)</td>           // 2nd row
  <td>Area % of total</td> // 3rds row
  <td>Constituents*</td>  // 4th row
 </tr>

gives

ADD three new columns

serial    compound   percent
1            4                 3

Note that each table has columns in different order, e.g.

 <tr>
  <td>PMC5080681</td>
  <td>Table 1</td>
  <td>Chemical composition, concentrations (%) and calculated rete ... </td>
  <td>Constituents</td>
  <td>%</td>
  <td>RI C </td>
  <td>RI L </td>
 </tr>

gives

serial    compound   percent
.            1                 2
ambarishK commented 5 years ago

Sir, please go through the table entry.

  <td>PMC5620597</td>
  <td>Table 2</td>
  <td>Chemical composition of the essential oils from O. basilicum ... </td>
  <td> </td>
  <td>Compound</td>
  <td>RI a </td>
  <td>OCK1</td>
  <td>OCK2</td>
  <td>OCK3</td>
  <td>OCP4</td>
  <td>OCP5</td>
  <td>OCS6</td>
  <td>OCS7</td>
</tr>

As per the entry serial, compound and percent entries should be.


serial                  compound                       percent

.                                1                                    3,4,5,6,7,8,9,10.11

But looking at articles into web-browser as full text we get table as.


       Compound    RI a   OCK1   OCK2   OCK3    OCP4   OCP5    OCS6   OCS7   OCM8   OCM9  OCM10

1      α-thujene     923       -           -            tr           -          -            tr            tr         tr            tr        | tr

2        α-pinene     932    0.9       0.1         0.4        0.8        0.2         0.8       0.3        0.5        0.3     0.4

It has serial number column.

Should I go as per the above entry or below one?

petermr commented 5 years ago

If a particular table is non-standard, add a note to explain and I will work it out. But please push the composition table now so I can inspect it.

On Thu, Oct 31, 2019 at 8:49 AM Ambarish Kumar notifications@github.com wrote:

Sir, please go through the table entry.

PMC5620597 Table 2 Chemical composition of the essential oils from O. basilicum ... Compound RI a OCK1 OCK2 OCK3 OCP4 OCP5 OCS6 OCS7

As per the entry serial, compound and percent entries should be.

serial compound percent

. 1 3,4,5,6,7,8,9,10.11

But looking at articles into web-browser as full text we get table as.

   Compound    RI a   OCK1   OCK2   OCK3    OCP4   OCP5    OCS6   OCS7   OCM8   OCM9  OCM10

1 α-thujene 923 - - tr - - tr tr tr tr | tr

2 α-pinene 932 0.9 0.1 0.4 0.8 0.2 0.8 0.3 0.5 0.3 0.4

It has serial number column.

Should I go as per the above entry or below one?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/44?email_source=notifications&email_token=AAFTCS4JMZ34FQVV6BMIYMTQRKL3LA5CNFSM4JGAO77KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECW7BGQ#issuecomment-548270234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6RHOKNFBN7L4RYOI3QRKL3LANCNFSM4JGAO77A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 5 years ago

Sir, please go through the added columns to composition20191028.tsv.

Notes are added into notes column.

ambarishK commented 5 years ago

Sir, I have revised the content of composition20191028.tsv. Please suggest for any changes or corrections.

petermr commented 5 years ago

Thank you. I will add a "notes" column and raise specific questions or issues in that. I have tried to call you on Hangout.

On Tue, Nov 5, 2019 at 9:31 AM Ambarish Kumar notifications@github.com wrote:

Sir, I have revised the content of composition20191028.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/composition20191028.tsv. Please suggest for any changes or corrections.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/44?email_source=notifications&email_token=AAFTCS2QFNQKF65LUFKPJB3QSE4OLA5CNFSM4JGAO77KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDCF2IA#issuecomment-549739808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7IIK23DCYBRE3PRXDQSE4OLANCNFSM4JGAO77A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 5 years ago

Sir, updated sheet for composition table is composition20191028.tsv.

I have made all changes and verified for any possible error into it.

There is minimal chance of any error into it.

ambarishK commented 5 years ago

Sir, please go through the composition sheet with added column for compound_title and percent_title. It contains updated records for first 15 articles.

composition20191028.tsv

Column description -

ambarishK commented 5 years ago

Sir, please go through the completed composition table - composition20191028.tsv.

There are added compound_title and percent_title column into it.

petermr commented 5 years ago

Thank you. The next task will be to look up the compounds in the compound dictionary. Please take the composition tables and add manually annotated (a) E2.0 compound identifiers and (b) Wikidata identifiers

On Thu, Nov 7, 2019 at 7:04 AM Ambarish Kumar notifications@github.com wrote:

Sir, please go through the completed composition table - composition20191028.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/composition20191028.tsv .

There are added compound_title and percent_title column into it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/44?email_source=notifications&email_token=AAFTCS7NJQECK5TSHI2BD73QSO4WLA5CNFSM4JGAO77KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDLJBLA#issuecomment-550932652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6O674BQFOVMGAOT63QSO4WLANCNFSM4JGAO77A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 5 years ago

Start with the first entries e.g thyme and go through each table in order. I will be back in 1 hour to see how you are doing

On Thu, 7 Nov 2019, 08:49 Peter Murray-Rust, < peter.murray.rust@googlemail.com> wrote:

Thank you. The next task will be to look up the compounds in the compound dictionary. Please take the composition tables and add manually annotated (a) E2.0 compound identifiers and (b) Wikidata identifiers

On Thu, Nov 7, 2019 at 7:04 AM Ambarish Kumar notifications@github.com wrote:

Sir, please go through the completed composition table - composition20191028.tsv https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/composition20191028.tsv .

There are added compound_title and percent_title column into it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/44?email_source=notifications&email_token=AAFTCS7NJQECK5TSHI2BD73QSO4WLA5CNFSM4JGAO77KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDLJBLA#issuecomment-550932652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6O674BQFOVMGAOT63QSO4WLANCNFSM4JGAO77A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ambarishK commented 5 years ago

Sir, please go through the thyme EO table - thyme.tsv

Column description -

ambarishK commented 5 years ago

Sir, renaming thyme.tsv to EOconstituents.tsv.

Updating the records for rest other articles.