petermr / ami3

Integration of cephis and normami code into a single base. Tests will be slimmed down
Apache License 2.0
17 stars 5 forks source link

File Path Errors running AMI3 #60

Open EmanuelFaria opened 4 years ago

EmanuelFaria commented 4 years ago

Having updating Ami, today I ran it for the first time in a long time — and getting errors. (Terminal Log attached). (I'm running on a mac, by the way, and just made sure I have the latest version of java running).

Thanks in advance for your input and help!

Manny

Here's the command I used:

ami -v -p /Users/emanuelfaria/DAVEapps/getpapers/oil1000 search --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoActivity.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoAnalysisMethod.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoCompound.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoExtractionMethod.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlant.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlantExtractionProduct.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlantMaterialHistory.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlantPart.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoTargetOrganism.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/geoLocation.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/humanDiseases.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/pests.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/VChumanSkinDiseases.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/PetersPhytochemicals.xml

I'm getting the following errors:

cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoActivity.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoAnalysisMethod.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoCompound.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoExtractionMethod.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoPlant.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoPlantExtractionProduct.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoPlantMaterialHistory.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoPlantPart.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoTargetOrganism.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/geoLocation.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/humanDiseases.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/pests.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/VChumanSkinDiseases.xml
cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/PetersPhytochemicals.xml
!logs .Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54/10/pmcstop.txt
Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54/10/stopwords.txt

I also have a lot of warnings (?) that look like this: large document (531) for PMC6479398 truncated to 500 sections

The weirdest error, however, was that ami was looking for "plugins" in a directory I used in the previous version (see path below) — but Ami's new bin folder doesn't even contain a "plugins" folder. Should that be the case?

java.lang.RuntimeException: cannot process argument: --sr.search (RuntimeException: cannot read inputStream for dictionary: /org/contentmine/ami/plugins/dictionary/Users/emanuelfaria/DAVEapps/dictionary/eoAnalysisMethod.xml)

I couldn't remember how to set the default path (I didn't see it in the documentation, nor in the text displayed when typing ami --help), so I recreated the missing folder and contents in the old location and re-ran the command. That eliminated those types of errors errors, but I'll still need to know how to set ami to look in the new (correct) location by default for the seemingly missing plugin folder in Ami3's new bin.

In terms of output, I seem to be missing a lot:

  1. No folders for sections, tables, etc.
  2. search..count.xml and search..documents.xml files have no entries.
  3. But many but not all of the search..snippets.xml files DO have entries in them.

Hopefully these anomalies will resolve once I've got the directories worked out.

Terminal Saved Output.txt

petermr commented 4 years ago

Commenting one by one...

On Mon, Aug 17, 2020 at 9:44 PM Emanuel Faria notifications@github.com wrote:

Having updating Ami, today I ran it for the first time in a long time — and getting errors. (Terminal Log attached). (I'm running on a mac, by the way, and just made sure I have the latest version of java running).

Thanks, useful info

Thanks in advance for your input and help!

Manny Here's the command I used:

ami -v -p /Users/emanuelfaria/DAVEapps/getpapers/oil1000 search --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoActivity.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoAnalysisMethod.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoCompound.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoExtractionMethod.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlant.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlantExtractionProduct.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlantMaterialHistory.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoPlantPart.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/eoTargetOrganism.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/geoLocation.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/humanDiseases.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/pests.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/VChumanSkinDiseases.xml --dictionary /Users/emanuelfaria/DAVEapps/dictionary/PetersPhytochemicals.xml I'm getting the following errors:

cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/eoActivity.xml

... snipped

cannot find dictionary: /Users/emanuelfaria/DAVEapps/dictionary/PetersPhytochemicals.xml

try

just use ONE --dictionary and then the list.

If it persists make sure the dictionaries actually exist.

ls /Users/emanuelfaria/DAVEapps/dictionary/PetersPhytochemicals.xml

Also when debugging suggest you start off small. Try just one dictionary. If it fails on that

!logs .Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54/10/pmcstop.txt

Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54/10/stopwords.txt

I hope this can be ignored....

I also have a lot of warnings (?) that look like this: large document (531) for PMC6479398 truncated to 500 sections

That's because they are huge documents and take a long time. See if they are really useful. I hope it will be possible to set this limit if you want.

The weirdest error, however, was that ami was looking for "plugins" in a directory I used in the previous version (see path below) — but Ami's new bin folder doesn't even contain a "plugins" folder. Should that be the case?

java.lang.RuntimeException: cannot process argument: --sr.search (RuntimeException: cannot read inputStream for dictionary: /org/contentmine/ami/plugins/dictionary/Users/emanuelfaria/DAVEapps/dictionary/eoAnalysisMethod.xml)

Try again with with just one --dictionary

I couldn't remember how to set the default path (I didn't see it in the documentation, nor in the text displayed when typing ami --help), so I recreated the missing folder and contents in the old location and re-ran the command. That eliminated those types of errors errors, but I'll still need to know how to set ami to look in the new (correct) location by default for the seemingly missing plugin folder in Ami3's new bin.

plugin is the builtin dictionary. It's only needed when you don't have explicit filenames.

In terms of output, I seem to be missing a lot:

  1. No folders for sections, tables, etc.

Those are created by ami section

  1. search..count.xml and search..documents.xml files have no entries.

Probably a. bug

  1. But many but not all of the search..snippets.xml files DO have entries in them.

Hopefully these anomalies will resolve once I've got the directories worked out.

Terminal Saved Output.txt https://github.com/petermr/ami3/files/5086698/Terminal.Saved.Output.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/60, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5GYXN5QQK6GDQFTEDSBGJDDANCNFSM4QCF7RXQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

EmanuelFaria commented 4 years ago

thanks for the input, Peter.

I ran it again with just one dictionary, as you suggested. It's still not working.

It's finding my article folders and outputting some xml files (some empty, some with data), but not doing anything related to the dictionaries. No tables, or parts

It seems that it's still looking for an old path.

Besides still saying "cannot find dictionary", this snippet (below) says it's looking for the stopwords.txt file in this directory, which seems to be pointing to the old location I used in previous version of ami Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54/10/stopwords.txt

I deep-searched my mac (even for "invisible" files) and no "stopwords.txt" file exists whatsoever.

I'm attaching a screenshot of the same results I got months ago for the same set of files, that worked with the previous version of ami (with all the necessary directories in their previous location), and next to it the results I'm getting now. You'll see my current file path, as well as the output folders that are missing.

Hope this helps.

Manny

P.S. Am I missing a command to tell ami to ignore the old path and set it to a new one?

Screen Shot 2020-08-18 at 4 13 16 PM

EmanuelFaria commented 4 years ago

@petermr is there a different way to begin a filepath on mac? something that starts with a tilde? ~/

Nevermind. I looked it up and tried various ways. Nothing worked.

I also just tried copying my dictionaries directory into the same folder where the PMC articles are and modified the query — still using one dictionary — and it still doesn't work. Same errors every time.

I tried set $DICTIONARY /Users/emanuelfarruda/DAVEapps/dictionary ... and it didn't do anything either.

I tried with a slash at the end, set $DICTIONARY /Users/emanuelfarruda/DAVEapps/dictionary/ ... and I just got a message that said ... "is a directory"

Query:

ami -v -p /Users/emanuelfaria/DAVEapps/getpapers/oil1000 search --dictionary /Users/emanuelfaria/DAVEapps/getpapers/oil1000/dictionary/eoActivity.xml

Log message:

Generic values (AMISearchTool)
================================
input basename      null
input basename list null
cproject            /Users/emanuelfaria/DAVEapps/getpapers/oil1000
ctree               
cTreeList           1002 trees [/Users/emanuelfaria/DAVEapps/getpapers/oil1000/
excludeBase         {}
excludeTrees        {}
forceMake           false
includeBase         {}
includeTrees        null
log4j               {}
verbose             1

Specific values (AMISearchTool)
================================
Command line options for 'ami search':
--stripNumbers      : d     false

--wordCount         : d (20,1000000)

--wordLength        : d    (1,20)

--wikidataBiblio    : d     false

--no-oldstyle       : d      true

--dictionary        : m [/Users/emanuelfaria/DAVEapps/getpapers/oil1000/dictionary/eoActivity.xml]

--dictionarySuffix  : d     [xml]

--dictionaryTop     : d      null

--ignorePlugins     : d        []

--help              : d     false

--version           : d     false

cannot find dictionary: /Users/emanuelfaria/DAVEapps/getpapers/oil1000/dictionary/eoActivity.xml
EmanuelFaria commented 4 years ago

is there something I could try with sudo nano /etc/paths ?

petermr commented 4 years ago

Please detail the commands you ran. Don't use phrases like "I ran it again" (I don't know what "it" is). And not "It's still not working" . That's meaningless.

On Tue, Aug 18, 2020 at 8:19 PM Emanuel Faria notifications@github.com wrote:

It seems that it's still looking for an old path.

Besides still saying "cannot find dictionary", this snippet (below) says it's looking for the stopwords.txt file in this directory, which seems to be pointing to the old location I used in previous version of ami Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54/10/stopwords.txt

What command did you run? what files were used? what was in them. Cut out words: |

I deep-searched my mac (even for "invisible" files) and no "stopwords.txt" file exists whatsoever.

If this is a problem it may be due to a different versionof the jar file, but I need details.

I'm attaching a screenshot of the same results I got months ago for the same set of files, that worked with the previous version of ami (with all the necessary directories in their previous location), and next to it the results I'm getting now. You'll see my current file path, as well as the output folders that are missing.

Give the commands and I'll see what can be done. If this is all ami serach then the interns are running it OK.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

EmanuelFaria commented 4 years ago

Starting over.

I ran the following command: ami -v -p /Users/emanuelfaria/DAVEapps/getpapers/oil1000 search --dictionary /Users/emanuelfaria/DAVEapps/getpapers/oil1000/dictionary/eoActivity.xml

Attached please find:

The terminal Saved Output ... along with The Terminal Saved output for what I got when I lunched each of the executable files in the bin folder. I don't know what any of it means, but a couple of things stood out for me.

For example, line 3 of the output file for the ami-all executable says: contentMine exist??? Is it looking for contentMine code that isn't there? ami-gne, ami-identifier, ami-regex, ami-sequence and ami-species all have references to "contentMine" in them too.

ami-gene seems to have some java issues, but I'm not sure...

no --output given
Exception in thread "main" java.lang.NullPointerException
    at org.contentmine.ami.plugins.AMIArgProcessor.truncateLargeLists(AMIArgProcessor.java:278)
    at org.contentmine.ami.plugins.AMIArgProcessor.ensureSectionElements(AMIArgProcessor.java:265)
    at org.contentmine.ami.plugins.AMIArgProcessor.runRunMethodsOnChosenArgOptions(AMIArgProcessor.java:234)
    at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1251)
    at org.contentmine.ami.plugins.gene.GenePlugin.main(GenePlugin.java:32)

Hope this helps. Let me know how to produce any other data you need please.

Thanks

manny

Terminal Saved Output.txt

ami Terminal Saved Output.txt ami-all Terminal Saved Output.txt ami-frequencies Terminal Saved Output.txt ami-gene Terminal Saved Output.txt ami-identifier Terminal Saved Output.txt ami-regex Terminal Saved Output.txt ami-sequence Terminal Saved Output.txt ami-species Terminal Saved Output.txt amidict Terminal Saved Output.txt pman Terminal Saved Output.txt

petermr commented 4 years ago

On Wed, Aug 19, 2020 at 12:16 AM Emanuel Faria notifications@github.com wrote:

Starting over.

I ran the following command: ami -v -p /Users/emanuelfaria/DAVEapps/getpapers/oil1000 search --dictionary /Users/emanuelfaria/DAVEapps/getpapers/oil1000/dictionary/eoActivity.xml

Attached please find:

The terminal Saved Output ...

I don't see any problem with your output

along with The Terminal Saved output for what I got when I lunched each of the executable files in the bin folder.

You only need ami

I don't know what any of it means, but a couple of things stood out for me.

For example, line 3 of the output file for the ami-all executable says: contentMine exist??? Is it looking for contentMine code that isn't there? ami-gne, ami-identifier, ami-regex, ami-sequence and ami-species all have references to "contentMine" in them too.

ami-gene seems to have some java issues, but I'm not sure...

You don't need that. Do not run it.

Did ami search create full.dataTables.html? If so , that's the output.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

EmanuelFaria commented 4 years ago

Did ami search create full.dataTables.html? If so , that's the output.

Yes. full.dataTables.html is there. (attached) full.dataTables.html.zip

But where are the summary folders (for all of the tables, figures, __supplementary files, etc) for the corpus? And are the "cannot find dictionary" messages themselves an error?

Is there any path missing in my etc/paths setup? I'm attaching a screenshot of what's in there now, just in case.

Screen Shot 2020-08-18 at 8 42 11 PM

petermr commented 4 years ago

The full.dataTables.html is your result.

On Wed, Aug 19, 2020 at 1:14 AM Emanuel Faria notifications@github.com wrote:

Did ami search create full.dataTables.html? If so , that's the output.

Yes. full.dataTables.html is there. (attached) full.dataTables.html.zip https://github.com/petermr/ami3/files/5093396/full.dataTables.html.zip

I don't want it. Read it. It's what you are running ami search for. And please think about what's in it before asking for more.

But where are the summary folders (for all of the tables, figures, __supplementary files, etc) for the corpus?

you have to create them with ami summary . But try to understand ami search first.

And are the "cannot find dictionary" messages themselves an error?

No idea.

Is there any path missing in my etc/paths setup?`

No. If you can run ami then your path is OK.

In general it's a good idea with new software to start with small example, make sure they run, and make sure you understand what they are doing. Then you can increase the number of articles, and number of dictionaries. Running 13 dictionaries on 30,000 articles gives you 10 million results. Are you going to read them all?

Time to think about how you will use the results.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK