petermr / openVirus

aggregation of scholarly publications and extracted knowledge on viruses and epidemics.
The Unlicense
67 stars 17 forks source link

ami search: 'OutOfMemoryError' #73

Closed vaishaliarora277 closed 4 years ago

vaishaliarora277 commented 4 years ago

Using Amisearch for a corpus of 950 articles showing an OutOfMemoryErrorwhen searched for the dictionary, shoeed the following error: ....

Caused by: java.lang.OutOfMemoryError: Java heap space
544001 [main] ERROR org.contentmine.cproject.args.DefaultArgProcessor  - ERR! java.lang.RuntimeException: cannot run [runTransform] in --transform (OutOfMemoryError: Java heap space)
PMC7259790 java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at org.contentmine.cproject.args.DefaultArgProcessor.instantiateAndRunMethod(DefaultArgProcessor.java:1191)
        at org.contentmine.cproject.args.DefaultArgProcessor.runMethodsOfType(DefaultArgProcessor.java:1085)
        at org.contentmine.cproject.args.DefaultArgProcessor.runRunMethodsOnChosenArgOptions(DefaultArgProcessor.java:1064)
        at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1263)
        at org.contentmine.norma.Norma.run(Norma.java:38)
        at org.contentmine.ami.plugins.CommandProcessor.runNormaIfNecessary(CommandProcessor.java:182)
        at org.contentmine.ami.tools.AbstractAMISearchTool.runLegacyCommandProcessor(AbstractAMISearchTool.java:229)
        at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:249)
        at org.contentmine.ami.tools.AMISearchTool.runProjectSearch(AMISearchTool.java:244)
        at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:230)
        at org.contentmine.ami.tools.AbstractAMISearchTool.runSpecifics(AbstractAMISearchTool.java:182)
        at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:212)
        at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:192)
        at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:39)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1853)
        at picocli.CommandLine.access$1100(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2255)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2213)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2080)
        at picocli.CommandLine.execute(CommandLine.java:1978)
        at org.contentmine.ami.tools.AMI.main(AMI.java:113)
Caused by: java.lang.OutOfMemoryError: Java heap space
552824 [main] DEBUG org.contentmine.cproject.args.DefaultArgProcessor  -  exception in option:  or --transform; (1,2147483647); parseTransform; STRING: null / []; nlm2html; [nlm2html]
java.lang.RuntimeException: cannot run [runTransform] in --transform (OutOfMemoryError: Java heap space)
        at org.contentmine.cproject.args.DefaultArgProcessor.runMethodsOfType(DefaultArgProcessor.java:1093)
        at org.contentmine.cproject.args.DefaultArgProcessor.runRunMethodsOnChosenArgOptions(DefaultArgProcessor.java:1064)
        at org.contentmine.cproject.args.DefaultArgProcessor.runAndOutput(DefaultArgProcessor.java:1263)
        at org.contentmine.norma.Norma.run(Norma.java:38)
        at org.contentmine.ami.plugins.CommandProcessor.runNormaIfNecessary(CommandProcessor.java:182)
        at org.contentmine.ami.tools.AbstractAMISearchTool.runLegacyCommandProcessor(AbstractAMISearchTool.java:229)
        at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:249)
        at org.contentmine.ami.tools.AMISearchTool.runProjectSearch(AMISearchTool.java:244)
        at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:230)
        at org.contentmine.ami.tools.AbstractAMISearchTool.runSpecifics(AbstractAMISearchTool.java:182)
        at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:212)
        at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:192)
        at org.contentmine.ami.tools.AbstractAMITool.call(AbstractAMITool.java:39)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1853)
        at picocli.CommandLine.access$1100(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2255)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2249)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2080)
        at picocli.CommandLine.execute(CommandLine.java:1978)
        at org.contentmine.ami.tools.AMI.main(AMI.java:113)
Caused by: java.lang.RuntimeException: invoke runTransform fails
        at org.contentmine.cproject.args.DefaultArgProcessor.instantiateAndRunMethod(DefaultArgProcessor.java:1196)
        at org.contentmine.cproject.args.DefaultArgProcessor.runMethodsOfType(DefaultArgProcessor.java:1085)
        ... 20 more
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at org.contentmine.cproject.args.DefaultArgProcessor.instantiateAndRunMethod(DefaultArgProcessor.java:1191)
        ... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space
552871 [main] ERROR org.contentmine.cproject.args.DefaultArgProcessor  - ERR! java.lang.RuntimeException: cannot run [runTransform] in --transform (OutOfMemoryError: Java heap space)
PMC7261124

....
Priya-Jk-15 commented 4 years ago

Using the dictionary disease for ami search, I also had the same error mentioned by @vaishaliarora277 for my corpus 950.

petermr commented 4 years ago

I am assuming that you are running something like: ami -p myproject950 search disease .. Do you get a list of successful documents (PMCddddddd)?

try : <cd to project directory - i.e. where the 950 files are> ls -l . PMC*/scholarly.html | wc This should tell you how many transformations worked.

`ls -

petermr commented 4 years ago

The new logger from Remko may help on this. Do you have a file logs/ami.log in your filestore (probably in your project directory?

petermr commented 4 years ago

also see https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError

vaishaliarora277 commented 4 years ago

@petermr, I was running ami -p miniprojectfunders search --dictionary funders. I don't have logs/ami.log file in my CProject directory named miniprojectfunders. Yes, I did got a list of 952 items in my directory. OS: Windows 10

C:\Users\me>miniprojectfunders 1s -1 . PMC*/scholarly.html | wc
'miniprojectfunders' is not recognized as an internal or external command,
operable program or batch file.
petermr commented 4 years ago

Thanks!

On Fri, Jul 3, 2020 at 5:07 PM VAISHALI ARORA notifications@github.com wrote:

@petermr https://github.com/petermr, I was running ami -p miniprojectfunders search --dictionary funders. I don't have logs/ami.log file in my CProject directory named miniprojectfunders. Yes, I did got a list of 952 items in my directory. OS: Windows 10

C:\Users\me>miniprojectfunders 1s -1 . PMC*/scholarly.html | wc 'miniprojectfunders' is not recognized as an internal or external command, operable program or batch file.

You have an unwanted word in the command. and also two of your "els" are "ones" 1s -1 . PMC/scholarly.html | wc should be ls -l . PMC/scholarly.html | wc

We all make this mistake!! It's very difficult to distinguish "el" from "one" in some fonts

Should be: 1s -1 . PMC*/scholarly.html | wc

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/73#issuecomment-653609867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSY7XIV4J7VR7OJC6PDRZX65ZANCNFSM4ONOFEGQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Priya-Jk-15 commented 4 years ago

In command prompt, I gave the command cd mpc (mpc is the directory where my 950 files are) and then I tried the command ls -l . PMC*/scholarly.html | wc The output was 'ls' is not recognized as an internal or external command, operable program or batch file.

So I tried the same commands in Git Bash and got the following some numbers as output. amisearch

@petermr Is the above output correct?

I also tried setting the environment variable MAVEN_OPTS = -Xmx512m -XX:MaxPermSize=128m as per

also see https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError

and gave the command ami -p mpc search --dictionary disease. The output was

[...]
Caused by: java.lang.OutOfMemoryError: Java heap space
569564 [main] DEBUG org.contentmine.cproject.args.DefaultArgProcessor - exception in option: or --transform; (1,2147483647); parseTransform; STRING: null / []; nlm2html; [nlm2html]
569564 [main] DEBUG org.contentmine.cproject.args.DefaultArgProcessor - exception in option: or --transform; (1,2147483647); parseTransform; STRING: null / []; nlm2html; [nlm2html]
[...]

@petermr Is there any change should I do in the environment variable?

vaishaliarora277 commented 4 years ago

Thanks @petermr, I first entered this in the Command prompt :set MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m I again ran this : C:\Users\me>ami -p miniprojectfunders search --dictionary funders and got :

+++++++++++++++++++running: search; search([funders])[]
279807 [main] DEBUG org.contentmine.ami.plugins.CommandProcessor  -
+++++++++++++++++++running: search; search([funders])[]
..............................................
large document (1507) for PMC6824115 truncated to 500 sections
.......................................................................................................

I got no search tables for dictionary funders, so next, I deleted this large file PMC6824115 from the directory and again run the same command:

C:\Users\me>ami -p miniprojectfunders search --dictionary funders

This time I got full data tables in my directory with complete search for dictionary funders.

https://photos.google.com/search/_tra_/photo/AF1QipPM1Mytn-__zViXjfugVKIslmzYWMYp9RPEHv-2

petermr commented 4 years ago

Thanks - this is very clear.

We'll take it in bits:

In command prompt, I gave the command cd mpc (mpc is the directory where my 950 files are) and then I tried the command ls -l . PMC*/scholarly.html | wc The output was 'ls' is not recognized as an internal or external command, operable program or batch file.

PMR> maybe something is wrong with your PATH

Try ls

or

which ls

("which" tells you where the ls program is).

If you get "ls" working you probably want: either ls .

(list all files in current directory) OR

ls PMC*/scholarly.html

list the scholarly.html childrens of PMC* files.

Let's try to solve that and then move to the OOM error.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

Well done.

I'll create an FAQ and you can answer it!

On Sat, Jul 4, 2020 at 9:49 AM VAISHALI ARORA notifications@github.com wrote:

Thanks @petermr https://github.com/petermr, I first entered this in the Command prompt : set MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m I again ran this : C:\Users\me>ami -p miniprojectfunders search --dictionary funders and got :

+++++++++++++++++++running: search; search([funders])[] 279807 [main] DEBUG org.contentmine.ami.plugins.CommandProcessor - +++++++++++++++++++running: search; search([funders])[] .............................................. large document (1507) for PMC6824115 truncated to 500 sections .......................................................................................................

I got no search tables for dictionary funders, so next, I deleted this large file PMC6824115 from the directory and again run the same command:

C:\Users\me>ami -p miniprojectfunders search --dictionary funders

This time I got full data tables in my directory with complete search for dictionary funders.

https://photos.google.com/search/_tra_/photo/AF1QipPM1Mytn-__zViXjfugVKIslmzYWMYp9RPEHv-2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/73#issuecomment-653740399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7YWHRUWY6LI7QJWJDRZ3UH5ANCNFSM4ONOFEGQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

vaishaliarora277 commented 4 years ago

Thanks @petermr Sure, I'll do that.

Priya-Jk-15 commented 4 years ago

@petermr I gave the command which ls in command prompt and the output was : 'which' is not recognized as an internal or external command, operable program or batch file. So I tried in git and successfully got the path of ls.

But for using in command prompt, I found an equivalent command DIR for ls from https://skimfeed.com/blog/windows-command-prompt-ls-equivalent-dir/#:~:text=Answer%3A%20Type%20DIR%20to%20show,commands%20and%20their%20Windows%20equivalents. I used DIR in command prompt and the following output was obtained for a small directory lsdir

For my corpus containing 950 articles, I gave the command DIR mpc in command prompt and the output was

[...]
01/07/2020   03:44 PM     <DIR>          PMC7310742
01/07/2020   03:44 PM     <DIR>          PMC7312578
01/07/2020   03:44 PM     <DIR>          PMC7314749
01/07/2020   03:44 PM     <DIR>          PMC7316228
[...]

Next, when I gave the command DIR PMC*/scholarly.html , the output was Parameter format not correct - "scholarly.html".

Is there any change, I could do?

petermr commented 4 years ago

Well done.

On Sat, Jul 4, 2020 at 5:18 PM Lakshmi Devi Priya notifications@github.com wrote:

@petermr https://github.com/petermr I gave the command which ls in command prompt and the output was : 'which' is not recognized as an internal or external command, operable program or batch file. So I tried in git and successfully got the path of ls.

I forgot you were on Windows! which does not exist there. (We are going to remind each other which Operating system we are on).

But for using in command prompt, I found an equivalent command DIR for ls from https://skimfeed.com/blog/windows-command-prompt-ls-equivalent-dir/#:~:text=Answer%3A%20Type%20DIR%20to%20show,commands%20and%20their%20Windows%20equivalents .

Well done.

I used DIR in command prompt and the following output was obtained for a small directory [image: lsdir] https://user-images.githubusercontent.com/65600695/86516249-a7c14d80-be3c-11ea-9f31-423b5edb0968.PNG

For my corpus containing 950 articles, I gave the command DIR mpc in command prompt and the output was

[...] 01/07/2020 03:44 PM

PMC7310742 01/07/2020 03:44 PM PMC7312578 01/07/2020 03:44 PM PMC7314749 01/07/2020 03:44 PM PMC7316228 [...]

Good

Next, when I gave the command DIR PMC*/scholarly.html , the output was Parameter format not correct - "scholarly.html".

Is there any change, I could do?

This is another difference between Windows and Unix - they use backslash.

try

DIR PMC*\scholarly.html

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/73#issuecomment-653784950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7KSVNXDYPXLRQBW6TRZ5I5DANCNFSM4ONOFEGQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Priya-Jk-15 commented 4 years ago

For viewing the html files, first I gave the command cd mpc (mpc is the 950 articles directory) in command prompt and then I gave the command DIR PMC*\scholarly.html and the output was

The filename, directory name, or volume label syntax is incorrect,

@petermr I checked the filename and directory name. Is there anything I should do about the volume label?

petermr commented 4 years ago

I don't run windows. So I forget it doesn't expand the *.

Maybe you can run powershell. Clyde Davies knows how.

On Sun, Jul 5, 2020 at 5:17 AM Lakshmi Devi Priya notifications@github.com wrote:

For viewing the html files, first I gave the command cd mpc (mpc is the 950 articles directory) in command prompt and then I gave the command DIR PMC*\scholarly.html and the output was

The filename, directory name, or volume label syntax is incorrect,

@petermr https://github.com/petermr I checked the filename and directory name. Is there anything I should do about the volume label?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/73#issuecomment-653840064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS2I5WRM67RQXT3W3ADRZ75HFANCNFSM4ONOFEGQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Priya-Jk-15 commented 4 years ago

@petermr I tried the command DIR PMC*\scholarly.html, in Windows PowerShell. It worked and showed the following output :

           Directory: C:\Users\Admin\Desktop\mpc\PMC5764404

Mode                         LastWriteTime                 Length Name
----                        --------------                 ------ -----
-a----                 04/07/2020 11:11 AM                 124852 scholarly.html
[...]
petermr commented 4 years ago

Well done.

On Sun, Jul 5, 2020 at 11:55 AM Lakshmi Devi Priya notifications@github.com wrote:

@petermr https://github.com/petermr I tried the command DIR PMC*\scholarly.html, in Windows PowerShell. It worked and showed the following output :

       Directory: C:\Users\Admin\Desktop\mpc\PMC5764404

Mode LastWriteTime Length Name


-a---- 04/07/2020 11:11 AM 124852 scholarly.html [...]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/73#issuecomment-653873291, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6DLJG7BSXKOZE2N2LR2BLZHANCNFSM4ONOFEGQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

Closed as part of the learning process