musically-ut / matlab-stanford-postagger

Example of how to use Stanford PoS Tagger from Matlab
4 stars 2 forks source link

Can't make it with matlab-stanford-postagger #1

Open johnnykast opened 7 years ago

johnnykast commented 7 years ago

OK, here's my problem. I want to POS tagg a text file with product reviews, in order to proceed with sentiment analysis on that opinions. I downloaded the stanford parser 31/10/2016 and copied the file named english-left3words-distsim.tagger in Matlab's current working path. I have installed Java in my computer. I also added stanford-postagger.jar to the classpath using Matlab's command: javaaddpath(./stanford-postagger-2016-10-31/stanford-postagger.jar') Last, i call the PosTaggerM(str) function to POS tag the string contained in str variable. An error about MaxentTagger( ) occurs on line 40 of the PosTaggerM() function which is: tagger = MaxentTagger('./english-left3words-distsim.tagger'); Having in mind that i don't know Java, so i cannot debug the MaxentTagger method, can i be advised of what to do? PS: I saw that this POStagging is compatible with Matlab 2014b and i'm working with Matlab 2010b, but i understand that it has to do with the java part, not the Matlab part of the program. Any help or modifications would be very much appreciated.

musically-ut commented 7 years ago

What is the exact error message (along with the stack trace) that you get?

johnnykast commented 7 years ago

Thank for the reply. Ok i made the whole thing from the begining. Without changing anything in the code, the error message is as follows:

Loading default properties from trained tagger ./english-left3words-distsim.tagger Reading POS tagger model from ./english-left3words-distsim.tagger ... ??? Java exception occurred: java.io.InvalidClassException: edu.stanford.nlp.tagger.maxent.ExtractorDistsim; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 1

at java.io.ObjectStreamClass.initNonProxy(Unknown Source)

at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)

at java.io.ObjectInputStream.readClassDesc(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readArray(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:522)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:710)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:673)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:280)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:240)

Error in ==> PosTaggerM at 39 tagger = MaxentTagger('./english-left3words-distsim.tagger');

Error in ==> testing at 20 reply = PosTaggerM('This is a very small sample sentence for test purpose');

musically-ut commented 7 years ago

The problem looks like a mismatch between the java version used in the .jar file and the one used to create the .tagger file. The version difference could also lie in the JDK version; I remember 1.7 files were incompatible with 1.6 and the website had two different versions for them respectively.

This is suggested by the line: stream classdesc serialVersionUID = 2, local class serialVersionUID = 1.

Do you think you can try matching the versions together better?

johnnykast commented 7 years ago

First of all thank you for bothering and trying to help me. It's really important to me. Now i downloaded 3.4.1 version of stanford postagger (27-08-2014) and installed the JRE 7 ver. 1.7 on my computer, as I previously had JRE 8. When I hit run this time, i get the following error messge:

Reading POS tagger model from ./english-left3words-distsim.tagger ... ??? Java exception occurred: java.lang.OutOfMemoryError: Java heap space

at java.lang.StringBuilder.toString(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.readUTF(Unknown Source)

at java.io.ObjectInputStream.readString(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at java.util.HashMap.readObject(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readArray(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:590)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:816)

Error in ==> PosTaggerM at 39 tagger = MaxentTagger('./english-left3words-distsim.tagger');

Error in ==> testing at 20 reply = PosTaggerM('This is a very small sample sentence for test purpose'); Exception in thread "Timer-1" java.lang.OutOfMemoryError: Java heap space at java.lang.String.toLowerCase(Unknown Source) at java.io.Win32FileSystem.hashCode(Unknown Source) at java.io.File.hashCode(Unknown Source) at java.util.HashMap.getEntry(Unknown Source) at java.util.HashMap.containsKey(Unknown Source) at com.mathworks.mlwidgets.explorer.control.DirectoryListing$5.receive(DirectoryListing.java:273) at com.mathworks.mlwidgets.explorer.control.DirectoryListing$5.receive(DirectoryListing.java:267) at com.mathworks.util.NativeJava.listFiles(Native Method) at com.mathworks.mlwidgets.explorer.control.DirectoryListing.loadAndSendDirectory(DirectoryListing.java:265) at com.mathworks.mlwidgets.explorer.control.DirectoryListing.list(DirectoryListing.java:198) at com.mathworks.mlwidgets.explorer.control.DirectoryListing.getChildren(DirectoryListing.java:142) at com.mathworks.mlwidgets.explorer.control.DirectoryListing.getChildren(DirectoryListing.java:126) at com.mathworks.mlwidgets.explorer.control.DirectoryDocumentListing.refresh(DirectoryDocumentListing.java:244) at com.mathworks.mlwidgets.explorer.control.RefreshDaemon$3.run(RefreshDaemon.java:198) at java.util.TimerThread.mainLoop(Unknown Source) at java.util.TimerThread.run(Unknown Source) Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space at org.netbeans.editor.DrawEngine.draw(DrawEngine.java:1043) at org.netbeans.editor.DrawEngineLineView.paint(DrawEngineLineView.java:233) at org.netbeans.lib.editor.view.GapBoxViewChildren.paintChildren(GapBoxViewChildren.java:783) at org.netbeans.lib.editor.view.GapBoxView.paint(GapBoxView.java:1463) at org.netbeans.lib.editor.view.GapDocumentView.paint(GapDocumentView.java:231) at org.netbeans.editor.DrawEngineDocView.paint(DrawEngineDocView.java:314) at org.netbeans.editor.view.spi.LockView.paint(LockView.java:363) at javax.swing.plaf.basic.BasicTextUI$RootView.paint(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.paintSafely(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.paint(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.update(Unknown Source) at javax.swing.JComponent.paintComponent(Unknown Source) at javax.swing.JComponent.paint(Unknown Source) at javax.swing.JComponent.paintToOffscreen(Unknown Source) at javax.swing.RepaintManager$PaintManager.paintDoubleBuffered(Unknown Source) at javax.swing.RepaintManager$PaintManager.paint(Unknown Source) at javax.swing.BufferStrategyPaintManager.paint(Unknown Source) at javax.swing.RepaintManager.paint(Unknown Source) at javax.swing.JComponent._paintImmediately(Unknown Source) at javax.swing.JComponent.paintImmediately(Unknown Source) at javax.swing.RepaintManager.paintDirtyRegions(Unknown Source) at javax.swing.RepaintManager.paintDirtyRegions(Unknown Source) at javax.swing.RepaintManager.seqPaintDirtyRegions(Unknown Source) at javax.swing.SystemEventQueueUtilities$ComponentWorkRequest.run(Unknown Source) at java.awt.event.InvocationEvent.dispatch(Unknown Source) at java.awt.EventQueue.dispatchEvent(Unknown Source) at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.run(Unknown Source)

I'm not sure I know what else to change, in order to match the versions. Sorry for the inconvinience but i am not familiar with java. Do you think if i show you the part of the code where the function gets called from, would help?

musically-ut commented 7 years ago

Curioser and curioser.

The 2008 version does seem to be too old for Java to recognize the file. What does java -version say on your machine?

If it is not 1.8+, then you may have to step backwards one by one through the various downloads provided.

johnnykast commented 7 years ago

Ok, i tried a fiew combinations of different versions of java installed on my pc and postagger, ending up with java ver 7 build 1.7.0-b147 on my machine and stanford-postagger-2014-01-04 (also tried the 28-9-2008, 20-04-2011, 27-08-2014 and 31-10-2016) but still nothing, nothing nothing... :( With the current combination, when i press run in matlab, i get the following message Reading POS tagger model from ./english-left3words-distsim.tagger ... ??? and then after about half a minute, all these **Java exception occurred: java.lang.OutOfMemoryError: Java heap space

at java.lang.StringBuilder.toString(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.readUTF(Unknown Source)

at java.io.ObjectInputStream.readString(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at java.util.HashMap.readObject(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readArray(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:582)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:808)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:755)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:289)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:253)

Error in ==> PosTaggerM at 39 tagger = MaxentTagger('./english-left3words-distsim.tagger');

Error in ==> testing at 20 reply = PosTaggerM('This is a very small sample sentence for test purpose'); Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space at java.awt.Rectangle.getBounds(Unknown Source) at sun.java2d.SunGraphics2D.transformShape(Unknown Source) at sun.java2d.SunGraphics2D.transformShape(Unknown Source) at sun.java2d.SunGraphics2D.setClip(Unknown Source) at sun.java2d.SunGraphics2D.setClip(Unknown Source) at org.netbeans.editor.DrawGraphics$GraphicsDG.eol(DrawGraphics.java:778) at org.netbeans.editor.DrawEngine.handleEOL(DrawEngine.java:344) at org.netbeans.editor.DrawEngine.drawCurrentTokenFragment(DrawEngine.java:834) at org.netbeans.editor.DrawEngine.drawCurrentToken(DrawEngine.java:904) at org.netbeans.editor.DrawEngine.draw(DrawEngine.java:1088) at org.netbeans.editor.DrawEngineLineView.paint(DrawEngineLineView.java:233) at org.netbeans.lib.editor.view.GapBoxViewChildren.paintChildren(GapBoxViewChildren.java:783) at org.netbeans.lib.editor.view.GapBoxView.paint(GapBoxView.java:1463) at org.netbeans.lib.editor.view.GapDocumentView.paint(GapDocumentView.java:231) at org.netbeans.editor.DrawEngineDocView.paint(DrawEngineDocView.java:314) at org.netbeans.editor.view.spi.LockView.paint(LockView.java:363) at javax.swing.plaf.basic.BasicTextUI$RootView.paint(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.paintSafely(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.paint(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.update(Unknown Source) at javax.swing.JComponent.paintComponent(Unknown Source) at javax.swing.JComponent.paint(Unknown Source) at javax.swing.JComponent.paintToOffscreen(Unknown Source) at javax.swing.RepaintManager$PaintManager.paintDoubleBuffered(Unknown Source) at javax.swing.RepaintManager$PaintManager.paint(Unknown Source) at javax.swing.BufferStrategyPaintManager.paint(Unknown Source) at javax.swing.RepaintManager.paint(Unknown Source) at javax.swing.JComponent._paintImmediately(Unknown Source) at javax.swing.JComponent.paintImmediately(Unknown Source) at javax.swing.RepaintManager.paintDirtyRegions(Unknown Source) at javax.swing.RepaintManager.paintDirtyRegions(Unknown Source) at javax.swing.RepaintManager.seqPaintDirtyRegions(Unknown Source)** What can I do now?

musically-ut commented 7 years ago

Yay! Now we are getting somewhere

Running out of memory is common. There are instructions on the postagger page to change it for one instance, but I remember there being instructions to increase it system wide.

I will search for the instructions when I get to a better computing device.

~ ut

On 24 Jan 2017 19:18, "johnnykast" notifications@github.com wrote:

Ok, i tried a fiew combinations of different versions of java installed on my pc and postagger, ending up with java ver 7 build 1.7.0-b147 on my machine and stanford-postagger-2014-01-04 (also tried the 28-9-2008, 20-04-2011, 27-08-2014 and 31-10-2016) but still nothing, nothing nothing... :( With the current combination, when i press run in matlab, i get the following message Reading POS tagger model from ./english-left3words-distsim.tagger ... ??? and then after about half a minute, all these **Java exception occurred: java.lang.OutOfMemoryError: Java heap space

at java.lang.StringBuilder.toString(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.readUTF(Unknown Source)

at java.io.ObjectInputStream.readString(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at java.util.HashMap.readObject(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readArray(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.defaultReadFields(Unknown Source)

at java.io.ObjectInputStream.readSerialData(Unknown Source)

at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:582)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:808)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:755)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.(MaxentTagger.java:289)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.(MaxentTagger.java:253)

Error in ==> PosTaggerM at 39 tagger = MaxentTagger('./english-left3words-distsim.tagger');

Error in ==> testing at 20 reply = PosTaggerM('This is a very small sample sentence for test purpose'); Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space at java.awt.Rectangle.getBounds(Unknown Source) at sun.java2d.SunGraphics2D.transformShape(Unknown Source) at sun.java2d.SunGraphics2D.transformShape(Unknown Source) at sun.java2d.SunGraphics2D.setClip(Unknown Source) at sun.java2d.SunGraphics2D.setClip(Unknown Source) at org.netbeans.editor.DrawGraphics$GraphicsDG.eol(DrawGraphics.java:778) at org.netbeans.editor.DrawEngine.handleEOL(DrawEngine.java:344) at org.netbeans.editor.DrawEngine.drawCurrentTokenFragment( DrawEngine.java:834) at org.netbeans.editor.DrawEngine.drawCurrentToken(DrawEngine.java:904) at org.netbeans.editor.DrawEngine.draw(DrawEngine.java:1088) at org.netbeans.editor.DrawEngineLineView.paint( DrawEngineLineView.java:233) at org.netbeans.lib.editor.view.GapBoxViewChildren.paintChildren( GapBoxViewChildren.java:783) at org.netbeans.lib.editor.view.GapBoxView.paint(GapBoxView.java:1463) at org.netbeans.lib.editor.view.GapDocumentView.paint( GapDocumentView.java:231) at org.netbeans.editor.DrawEngineDocView.paint(DrawEngineDocView.java:314) at org.netbeans.editor.view.spi.LockView.paint(LockView.java:363) at javax.swing.plaf.basic.BasicTextUI$RootView.paint(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.paintSafely(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.paint(Unknown Source) at javax.swing.plaf.basic.BasicTextUI.update(Unknown Source) at javax.swing.JComponent.paintComponent(Unknown Source) at javax.swing.JComponent.paint(Unknown Source) at javax.swing.JComponent.paintToOffscreen(Unknown Source) at javax.swing.RepaintManager$PaintManager.paintDoubleBuffered(Unknown Source) at javax.swing.RepaintManager$PaintManager.paint(Unknown Source) at javax.swing.BufferStrategyPaintManager.paint(Unknown Source) at javax.swing.RepaintManager.paint(Unknown Source) at javax.swing.JComponent._paintImmediately(Unknown Source) at javax.swing.JComponent.paintImmediately(Unknown Source) at javax.swing.RepaintManager.paintDirtyRegions(Unknown Source) at javax.swing.RepaintManager.paintDirtyRegions(Unknown Source) at javax.swing.RepaintManager.seqPaintDirtyRegions(Unknown Source)** What can I do now?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/musically-ut/matlab-stanford-postagger/issues/1#issuecomment-274806883, or mute the thread https://github.com/notifications/unsubscribe-auth/AAesXMrelhp1DUlk7e7S5BhslFkh-l97ks5rVgErgaJpZM4Lqvkx .

johnnykast commented 7 years ago

Thanks again and again for bothering Utkarsh. You know I'm always eagerly waiting for your next post... My pc is running Matlab 2010a and has 2GB of RAM, if that is the memory that run out. Do you think we're going to get it work? Edit: On the postagger page says "...you may need to give java an option like java -mx200m..." Is this what you meant? And where do i enter this command? In my m-file, in the PosTaggerM or somewhere else?

musically-ut commented 7 years ago

The Java option that you need to set are -Xmx and -XX:MaxPermSize to ~ 1Gb as explained here: http://nlp.stanford.edu/software/tagger.shtml and https://plumbr.eu/outofmemoryerror/java-heap-space

This is how we set the Java options while inside MATLAB: ~https://plumbr.eu/outofmemoryerror/java-heap-space~ https://in.mathworks.com/help/matlab/matlab_env/java-opts-file.html

This is how you can find the default Max Heap size on your system: http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined

Whether increasing the heap size helps you or not depends on what the default maximum is. I suspect that there is some juice you can still draw from your old machine.

However, getting a bigger machine never hurts.

johnnykast commented 7 years ago

I'm afraid i didn't make it. From what i've been reading in the links you gave me, I went from command prompt to the stanford.postagger directory and run: java -Xmx1024m -Xms1024m -jar -stanford.postagger.jar I also tried an alternative, changing the heap space from the java's control panel but no, nothing. Nothing changed. Can you give my any more hints? Please?

musically-ut commented 7 years ago

Sorry, I pasted an incorrect link in my answer. The way to set those options while invoking the JVM from inside MATLAB is this: https://in.mathworks.com/help/matlab/matlab_env/java-opts-file.html I've fixed that error in the original comment as well now.

See if that works.

johnnykast commented 7 years ago

It seems to be working fine. At least with a small sentence. All i have to do now, is try to pos-tag the whole file with the reviews. I'll keep you posted. Thank you very much Utkarsh. You're my hero!

musically-ut commented 7 years ago

Good to know!

Just to reiterate, these are the settings which worked for you:

I would like to add this information to the README file for future reference. Was there anything else that you had to do to make it work?


As a sidenote, the code initializes the tagger object each time you call the function. This is an expensive operation, since it needs to read the entire file from disk. Hence, it would be best to call the function once with all the sentences that you have.

However, if you have to call the function multiple times for a bunch of sentences, since it may not be possible to hold all the sentences in memory at once, then it would be best to create a tagger once and then pass it again to a function which just does PoS tagging (i.e. the for loop).

johnnykast commented 7 years ago
  1. Yes, the settings are exactly as you sum up. Additional actions done was specifying java's heap size. Matlab's default value was 128MB and I put the slider to 450MB, while the slider's maximum position is 767MB.

  2. Another question (edited): In order to pos-tag many sentences from a cell, I transfered the command tagger = MaxentTagger('./english-left3words-distsim.tagger'); outside the function PosTaggerM(). I also added a parameter tagger at the calling and the definition of the PosTaggerM() function, which looks like this: RevTag{i} = PosTagM(tagger,Reviews{i}). Now I believe it creates the tagger only once, BUT when it stops tagging, RevTag cell appears blank in the variable editor (see picture attached) even though it contains the tagged sentences, as it seems if I display RevTag cell in the command window blank .

musically-ut commented 7 years ago

Thanks for the additional details. And, for the record, what you did with the function is exactly what I meant when I said that the tagger should be created only once.

As for RevTag being empty, I suspect it could have something to do with the fact that it contains a java ArrayList instead of a regular MATLAB object. There ought to be a way to convert it to MATLAB cell array of strings (as they do with Doubles here). However, I am not sure what that way is.

Perhaps you could do your analysis without converting them to MATLAB types first?

johnnykast commented 7 years ago

OK I'l look into it. Eahter way, your help was more than precious and I thank you again Utkarsh. I'll keep you posted about any progress with the conversion to matlab cell array.

johnnykast commented 7 years ago

The truth is that for now, i stopped searchnig for a way to make that conversion, as I finally got to use an alternative of your script, which taggs a whole file instead of a single sentence. BUT if I find some time a bit later, I'll look into it again and let you know about the results, so I did not forget to keep you posted as i have promised... Thanks again for the help and for the reference.

bodorin commented 7 years ago

Hi there,

I recently installed Stanford POS Tagger and I ran into the same issues as described above.

My treatment for "java.lang.OutOfMemoryError: Java heap space" reported above was to adjust the Java heap size (Matlab Home tab >> Environment section >> Preferences >> MATLAB >> General >> Java Heap Memory) to 512. This worked for me (Matlab 2012b, Stanford POS Tagger 3.4.1, JRE SE 7 build 1.7.0_79) .

Kind Regards, Nicolaie Popescu-Bodorin www.lmrec.org/bodorin/