petermr / tigr2ess

Materials for TIGR2ESS workshop in Delhi Feb 2019 - joint UK(Cambridge) - India project on Food Security.
Other
4 stars 10 forks source link

New version of AMISearch (ami-search-new) #48

Closed petermr closed 5 years ago

petermr commented 5 years ago

There is a new version of ami-search (ami20190218). Please pull it and use from now on.

Typical command:

ami-search-new -p ocimumten_xml/ --dictionary country plantparts

Notice the project requires -p and the dictionaries are arguments of the --dictionary option

Generic values (AMISearchTool)
================================
basename            null
cproject            /Users/pm286/workspace/tigr2ess/problems/ocimumten_xml
ctree               
cTreeList           [ocimumten_xml/PMC3037352, ocimumten_xml/PMC3134781, ocimumten_xml/PMC3137644, ocimumten_xml/PMC3185238, ocimumten_xml/PMC3218416, 
[...]
ocimumten_xml/PMC6321292, ocimumten_xml/PMC6335655]
dryrun              false
excludeBase         null
excludeTrees        null
file types          []
forceMake           false
includeBase         null
includeTrees        null
log4j               
logfile             null
verbose             0

Specific values (AMISearchTool)
================================
dictionaryList       [country, plantparts]
dictionaryTop        null
dictionarySuffix     [xml]
ignorePlugins        []

cProject: ocimumten_xml
0    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: country
cannot find dictionary: country
1    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: plantparts
cannot find dictionary: plantparts

I think these messages are false - I will fix them.

1    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - SEARCH running legacy processors
SEARCH running JSON bibliography

this will add the bibliography in column 2. It has a bug which I will fix.

running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
..........filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
..........summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
..........running: search; search([country])[]
..........filter: search([country])[]
..........summary: search([country])[]
..........running: search; search([plantparts])[]
..........filter: search([plantparts])[]
..........summary: search([plantparts])[]
..........create data tables
petermr commented 5 years ago

Have committed ami20190218a. Please test today (Monday) if possible. Thanks.

petermr commented 5 years ago

Please verify you can run the new version and that you get a tables that includes bibliographic data (title, authors, abstract). Use mouseover to see full content

ambarishK commented 5 years ago

Yes sir. It is running and generating bibliographic information into 2nd column of entries.dataTables.html file.

Run time log

ambarish123@ubuntu:~$ ami-search-new -p RiceOROryza/ --dictionary country gene plantparts drugs

Generic values (AMISearchTool)
================================
basename            null
cproject            /home/ambarish123/RiceOROryza
ctree               
cTreeList           [RiceOROryza/PMC5991738, RiceOROryza/PMC6086036, RiceOROryza/PMC6173730, RiceOROryza/PMC6203701, RiceOROryza/PMC6205584, RiceOROryza/PMC6208651, RiceOROryza/PMC6213974, RiceOROryza/PMC6222575, RiceOROryza/PMC6249995, RiceOROryza/PMC6265930, RiceOROryza/PMC6267922, RiceOROryza/PMC6278296, RiceOROryza/PMC6283022, RiceOROryza/PMC6311050, RiceOROryza/PMC6321642, RiceOROryza/PMC6337394, RiceOROryza/PMC6339128, RiceOROryza/PMC6339371, RiceOROryza/PMC6345233, RiceOROryza/PMC6357162]
dryrun              false
excludeBase         null
excludeTrees        null
file types          []
forceMake           false
includeBase         null
includeTrees        null
log4j               
logfile             null
verbose             0

Specific values (AMISearchTool)
================================
dictionaryList       [country, gene, plantparts, drugs]
dictionaryTop        null
dictionarySuffix     [xml]
ignorePlugins        []

cProject: RiceOROryza
0    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: country
cannot find dictionary: country
11   [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: plantparts
cannot find dictionary: plantparts
13   [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: drugs
cannot find dictionary: drugs
running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
..filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
..summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
..running: search; search([country])[]
..filter: search([country])[]
..summary: search([country])[]
..running: gene; gene([human])[]
71218 [main] DEBUG org.contentmine.ami.dictionary.gene.HGNCDictionary  - is /org/contentmine/ami/plugins/gene/hgnc/hgnc.xml
..filter: gene([human])[]
..summary: gene([human])[]
..running: search; search([plantparts])[]
..filter: search([plantparts])[]
..summary: search([plantparts])[]
..running: search; search([drugs])[]
..filter: search([drugs])[]
..summary: search([drugs])[]
..create data tables
111798 [main] WARN  org.contentmine.ami.plugins.ResultsAnalysisImpl  - Null pluginOption

screenshot of entries.DataTables.html

bibliography

ay-amityadav commented 5 years ago

I ran the following two commands: :getpapers -q 'Ocimum sanctum' -x -k 10 --outdir ./ami20190218a/ocimumten_xml :ami-search-new -p ./ami20190218a/ocimumten_xml/ --dictionary country plantparts

I mainly found four files with the column bibliography, which are commonest.dataTables.html, count.dataTables.html, entries.dataTables.html and full.dataTables.html. But none of these exihibit the desired behaviour. I am attaching the four files. Once again, Github is not allowing me to upload files , this time with .html extension. Please change the extension of the below files to .html. We probabily need to have a better way to communicate files for issues. commonest.dataTables.txt count.dataTables.txt entries.dataTables.txt full.dataTables.txt

I also checked the eupmc_results.json, and it does contain bibliographic information.

petermr commented 5 years ago

Please add details:

But **none of these exihibit the desired behaviour**

is not informative. How were they unsatisfactory?

Add the complete console output if necessary.

ay-amityadav commented 5 years ago

Here are the screenshoots for the four files:

commest.dataTables.html screenshot from 2019-02-18 21-44-40

count.dataTables.html screenshot from 2019-02-18 21-47-16

entries.dataTables.html screenshot from 2019-02-18 21-47-38

full.dataTables.html screenshot from 2019-02-18 21-47-58

In all the files above, the bibliography column doesn't contain any information.

Output for the command: getpapers -q 'Ocimum sanctum' -x -k 10 --outdir ./ami20190218a/ocimumten_xml

info: Searching using eupmc API
    info: Found 512 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.0.3 reported by api
info: Limiting to 10 hits
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: limiting hits
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt
info: Got XML URLs for 10 out of 10 results
info: Downloading fulltext XML files
Downloading files [==============================] 100% (10/10) [0.0s elapsed, eta 0.0]
info: All downloads succeeded!

Output for the command: ami-search-new -p ./ami20190218a/ocimumten_xml/ --dictionary country plantparts

Generic values (AMISearchTool)
================================
basename            null
cproject            /home/amit/Desktop/Workshop/Github/testing/millets/ami_analysis/./ami20190218a/ocimumten_xml
ctree               
cTreeList           [./ami20190218a/ocimumten_xml/PMC4631451, ./ami20190218a/ocimumten_xml/PMC4945999, ./ami20190218a/ocimumten_xml/PMC4971952, ./ami20190218a/ocimumten_xml/PMC5234046, ./ami20190218a/ocimumten_xml/PMC5301171, ./ami20190218a/ocimumten_xml/PMC5891864, ./ami20190218a/ocimumten_xml/PMC5987647, ./ami20190218a/ocimumten_xml/PMC6023537, ./ami20190218a/ocimumten_xml/PMC6200556, ./ami20190218a/ocimumten_xml/PMC6239295]
dryrun              false
excludeBase         null
excludeTrees        null
file types          []
forceMake           false
includeBase         null
includeTrees        null
log4j               
logfile             null
verbose             0

Specific values (AMISearchTool)
================================
dictionaryList       [country, plantparts]
dictionaryTop        null
dictionarySuffix     [xml]
ignorePlugins        []

cProject: ocimumten_xml
1    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: country
cannot find dictionary: country
1    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - cannot find builtin dictionary: plantparts
cannot find dictionary: plantparts
2    [main] DEBUG org.contentmine.ami.tools.AMISearchTool  - SEARCH running legacy processors
98   [main] DEBUG org.contentmine.ami.plugins.CommandProcessor  - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project ./ami20190218a/ocimumten_xml
PMC4631451 .PMC4945999 UNKNOWN nlm tag: issn-l
PMC4971952 PMC5234046 PMC5301171 PMC5891864 PMC5987647 PMC6023537 PMC6200556 PMC6239295 SEARCH running JSON bibliography
running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
.filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
.summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
.running: search; search([country])[]
.filter: search([country])[]
.summary: search([country])[]
.running: search; search([plantparts])[]
.filter: search([plantparts])[]
.summary: search([plantparts])[]
.create data tables
petermr commented 5 years ago

Thanks. Please run version ami20190218b and report the version number.

Also please don't run 500 files when we are testing. Use the osanctum200/ that I have already committed.

On Mon, Feb 18, 2019 at 4:45 PM amit yadav notifications@github.com wrote:

Here are the screenshoots for the four files:

commest.dataTables.html [image: screenshot from 2019-02-18 21-44-40] https://user-images.githubusercontent.com/17980255/52964995-c353c180-33c9-11e9-845a-12e7598a434d.png

count.dataTables.html [image: screenshot from 2019-02-18 21-47-16] https://user-images.githubusercontent.com/17980255/52965045-e4b4ad80-33c9-11e9-9607-bada54bae3d4.png

entries.dataTables.html [image: screenshot from 2019-02-18 21-47-38] https://user-images.githubusercontent.com/17980255/52965065-f4cc8d00-33c9-11e9-8032-8427b1fb8ddc.png

full.dataTables.html [image: screenshot from 2019-02-18 21-47-58] https://user-images.githubusercontent.com/17980255/52965097-0a41b700-33ca-11e9-8d16-1a50993d38a5.png

In all the files above, the bibliography column doesn't contain any information.

Output for the command: getpapers -q 'Ocimum sanctum' -x -k 10 --outdir ./ami20190218a/ocimumten_xml

info: Searching using eupmc API info: Found 512 open access results warn: This version of getpapers wasn't built with this version of the EuPMC api in mind warn: getpapers EuPMCVersion: 5.3.2 vs. 6.0.3 reported by api info: Limiting to 10 hits Retrieving results [==============================] 100% (eta 0.0s) info: Done collecting results info: limiting hits info: Saving result metadata info: Full EUPMC result metadata written to eupmc_results.json info: Individual EUPMC result metadata records written info: Extracting fulltext HTML URL list (may not be available for all articles) info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt info: Got XML URLs for 10 out of 10 results info: Downloading fulltext XML files Downloading files [==============================] 100% (10/10) [0.0s elapsed, eta 0.0] info: All downloads succeeded!

Output for the command: ami-search-new -p ./ami20190218a/ocimumten_xml/ --dictionary country plantparts

Generic values (AMISearchTool)

basename null cproject /home/amit/Desktop/Workshop/Github/testing/millets/ami_analysis/./ami20190218a/ocimumten_xml ctree cTreeList [./ami20190218a/ocimumten_xml/PMC4631451, ./ami20190218a/ocimumten_xml/PMC4945999, ./ami20190218a/ocimumten_xml/PMC4971952, ./ami20190218a/ocimumten_xml/PMC5234046, ./ami20190218a/ocimumten_xml/PMC5301171, ./ami20190218a/ocimumten_xml/PMC5891864, ./ami20190218a/ocimumten_xml/PMC5987647, ./ami20190218a/ocimumten_xml/PMC6023537, ./ami20190218a/ocimumten_xml/PMC6200556, ./ami20190218a/ocimumten_xml/PMC6239295] dryrun false excludeBase null excludeTrees null file types [] forceMake false includeBase null includeTrees null log4j logfile null verbose 0

Specific values (AMISearchTool)

dictionaryList [country, plantparts] dictionaryTop null dictionarySuffix [xml] ignorePlugins []

cProject: ocimumten_xml 1 [main] DEBUG org.contentmine.ami.tools.AMISearchTool - cannot find builtin dictionary: country cannot find dictionary: country 1 [main] DEBUG org.contentmine.ami.tools.AMISearchTool - cannot find builtin dictionary: plantparts cannot find dictionary: plantparts 2 [main] DEBUG org.contentmine.ami.tools.AMISearchTool - SEARCH running legacy processors 98 [main] DEBUG org.contentmine.ami.plugins.CommandProcessor - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project ./ami20190218a/ocimumten_xml PMC4631451 .PMC4945999 UNKNOWN nlm tag: issn-l PMC4971952 PMC5234046 PMC5301171 PMC5891864 PMC5987647 PMC6023537 PMC6200556 PMC6239295 SEARCH running JSON bibliography running: word; word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] .filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] .summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] .running: search; search([country])[] .filter: search([country])[] .summary: search([country])[] .running: search; search([plantparts])[] .filter: search([plantparts])[] .summary: search([plantparts])[] .create data tables

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/petermr/tigr2ess/issues/48#issuecomment-464805637, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsxS3Vc5N04_1EWv3BUWGdZlhkKFARcks5vOtidgaJpZM4bAvWH .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ay-amityadav commented 5 years ago

ami version: ami20190218c The commonest.dataTables.html looks the following, and the three other files also contain the same bibliography column.

commonest datatables The bibliography column does contain title, authors, and some text from the papers. I think it needs a bit tidying, probably add a small header in front of everyline such as Title: ....... Authors: ...... Text:.......

vinitamehlawat commented 5 years ago

Run ami's new version (ami20190218c )

And i am also agree with Amit's suggestion that Bibiliography column is quite looking like a paragarph if we can separte Title ,Author and Abstract it would be esay for delegates to understand the actual meaning of how this AMI working .

On Tue, Feb 19, 2019 at 11:10 AM amit yadav notifications@github.com wrote:

ami version: ami20190218c The commonest.dataTables.html looks the following, and the three other files also contain the same bibliography column.

[image: commonest datatables] https://user-images.githubusercontent.com/17980255/52992362-5c202680-3436-11e9-906d-a3df72f673eb.png The bibliography column does contain title, authors, and some text from the papers. I think it needs a bit tidying, probably add a small header in front of everyline such as Title: ....... Authors: ...... Text:.......

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/petermr/tigr2ess/issues/48#issuecomment-464991803, or mute the thread https://github.com/notifications/unsubscribe-auth/AtMJjzY3zFWYhEKCkvxrGSIaFwUnVyA2ks5vO45dgaJpZM4bAvWH .

petermr commented 5 years ago

Thank you all for suggestions about the bibliography. Will fix it in next version.

On Tue, Feb 19, 2019 at 7:32 AM vinitamehlawat notifications@github.com wrote:

Run ami's new version (ami20190218c )

And i am also agree with Amit's suggestion that Bibiliography column is quite looking like a paragarph if we can separte Title ,Author and Abstract it would be esay for delegates to understand the actual meaning of how this AMI working .

On Tue, Feb 19, 2019 at 11:10 AM amit yadav notifications@github.com wrote:

ami version: ami20190218c The commonest.dataTables.html looks the following, and the three other files also contain the same bibliography column.

[image: commonest datatables] < https://user-images.githubusercontent.com/17980255/52992362-5c202680-3436-11e9-906d-a3df72f673eb.png

The bibliography column does contain title, authors, and some text from the papers. I think it needs a bit tidying, probably add a small header in front of everyline such as Title: ....... Authors: ...... Text:.......

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/petermr/tigr2ess/issues/48#issuecomment-464991803, or mute the thread < https://github.com/notifications/unsubscribe-auth/AtMJjzY3zFWYhEKCkvxrGSIaFwUnVyA2ks5vO45dgaJpZM4bAvWH

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/petermr/tigr2ess/issues/48#issuecomment-465018414, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsxS1JCY-WiJ1-VdCxpyfAETdJWuLkxks5vO6iWgaJpZM4bAvWH .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK