Closed Avi197 closed 4 years ago
If using only VnCoreNLP's word segmenter, you should apply annotators="wseg"
. You might use the second option (using without a service).
Here is an example of Using VnCoreNLP's word segmenter to pre-process input raw texts.
I did try both option, service and without service, both return "Unable to parse form content" error.
I use this line without service
with VnCoreNLP(vncorenlp_file, annotators="wseg", max_heap_size='-Xmx4g') as vncorenlp:
Without service, it return
AssertionError: 400: Unable to parse form content
With service, it return a more specific error
org.eclipse.jetty.http.BadMessageException: 400: Unable to parse form content
java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IllegalStateException: Form too large: 250544 > 200000
I process data line by line so my guess is one is a bit too big? It work fine until the script reach that line
Have you downloaded the model and put it to the same folder of the jar file?
Yes, as i said above, it work fine until the script reach a specific line in the data file, then return the error
java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IllegalStateException: Form too large: 250544 > 200000
Then you might want to split this file into multiple smaller ones, and concatenate them later after performing word segmentation. Or you can use the original version of RDRSegmenter.
I think the problem is with jetty and the python wrapper? Probably your input text is too long and you should split it into smaller chunks. I refer you to https://github.com/dnanhkhoa/python-vncorenlp if it is the wrapper-related problem.
Closed! It's because this issue is related to the wrapper, not VnCoreNLP itself.
I got this problem when tokenizing data using vncorenlp python, and it work fine until it reach 1 of the line in the data file.
org.eclipse.jetty.http.BadMessageException: 400: Unable to parse form content at org.eclipse.jetty.server.Request.getParameters(Request.java:380) at org.eclipse.jetty.server.Request.getParameter(Request.java:1021) at javax.servlet.ServletRequestWrapper.getParameter(ServletRequestWrapper.java:194) at spark.Request.queryParams(Request.java:283) at spark.http.matching.RequestWrapper.queryParams(RequestWrapper.java:141) at vncorenlp.VnCoreNLPServer.handle(VnCoreNLPServer.java:247) at vncorenlp.VnCoreNLPServer.lambda$3(VnCoreNLPServer.java:184) at spark.ResponseTransformerRouteImpl$1.handle(ResponseTransformerRouteImpl.java:47) at spark.http.matching.Routes.execute(Routes.java:61) at spark.http.matching.MatcherFilter.doFilter(MatcherFilter.java:130) at spark.embeddedserver.jetty.JettyHandler.doHandle(JettyHandler.java:50) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1568) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:530) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IllegalStateException: Form too large: 250544 > 200000 at org.eclipse.jetty.server.Request.extractFormParameters(Request.java:523) at org.eclipse.jetty.server.Request.extractContentParameters(Request.java:461) at org.eclipse.jetty.server.Request.getParameters(Request.java:376)
Best regards