nlpsandbox / phi-annotator-spark-nlp

An implementation of NLP Sandbox PHI Annotator API based on Spark NLP
Apache License 2.0
0 stars 1 forks source link

Fix ConnectionRefusedError #21

Closed tschaffter closed 3 years ago

tschaffter commented 3 years ago

@thomasyu888 reported this issue that happens (randomly?) after annotating multiple notes (locally and during benchmarking).

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/py4j/java_gateway.py", line 1115, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
An error occurred while trying to connect to the Java server (127.0.0.1:35241)
[pid: 71|app: 0|req: 46/46] 172.19.0.1 () {54 vars in 957 bytes} [Sat Jul 31 02:47:36 2021] POST /api/v1/textDateAnnotations => generated 143 bytes in 3 msecs (HTTP/1.1 500) 2 headers in 91 bytes (1 switches on core 0)
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:35241)
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/py4j/java_gateway.py", line 977, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque
tschaffter commented 3 years ago

@thomasyu888 added:

but it looks like a different error message from what i got from your submission i can't reproduce this {'detail': 'An error occurred while calling o17274.load', 'status': 500,

tschaffter commented 3 years ago

Sometimes the Swagger UI shows the following error. However the tool can continue to be used and returns successful response. So this error is not one that crash the tool and the exponential backoff retries in place should solve this issue.

<html>

<head><title>502 Bad Gateway</title></head>

<body>

<center><h1>502 Bad Gateway</h1></center>

<hr><center>nginx/1.19.6</center>

</body>

</html>

I have also seen a 504 error with the same message (Bad Gateway).

tschaffter commented 3 years ago

I generated 500 requests by clicking on the Try button of Swagger UI. I get the following 500 error at some point and all the following requests generate the same error.

phi-annotator  | An error occurred while calling o106444.load.
phi-annotator  | : java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (8212M) > maxPhysicalBytes (8192M)
phi-annotator  |        at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:695)
phi-annotator  |        at org.tensorflow.internal.c_api.AbstractTF_ImportGraphDefOptions.newImportGraphDefOptions(AbstractTF_ImportGraphDefOptions.java:43)
phi-annotator  |        at org.tensorflow.Graph.importGraphDef(Graph.java:616)
phi-annotator  |        at org.tensorflow.Graph.importGraphDef(Graph.java:201)
phi-annotator  |        at org.tensorflow.Graph.importGraphDef(Graph.java:185)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readGraph(TensorflowWrapper.scala:370)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.unpackWithoutBundle(TensorflowWrapper.scala:297)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:426)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:146)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:121)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel$.readTensorflowModel(NerDLModel.scala:344)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.readNerGraph(NerDLModel.scala:315)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.readNerGraph$(NerDLModel.scala:314)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel$.readNerGraph(NerDLModel.scala:344)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.$anonfun$$init$$1(NerDLModel.scala:322)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.$anonfun$$init$$1$adapted(NerDLModel.scala:322)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:31)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:30)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
phi-annotator  |        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:30)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:41)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:41)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:19)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
phi-annotator  |        at jdk.internal.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
phi-annotator  |        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
phi-annotator  |        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:282)
phi-annotator  |        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
phi-annotator  |        at py4j.commands.CallCommand.execute(CallCommand.java:79)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.base/java.lang.Thread.run(Thread.java:829)
tschaffter commented 3 years ago

Comment from a member of the Spark NLP team regarding memory requirement: https://github.com/JohnSnowLabs/spark-nlp/issues/977#issuecomment-666186630

tschaffter commented 3 years ago

Tries again and same error around the 300th request. Here I have set JAVA_MAX_MEM=4G but the tensorflow error still mentions maxPhysicalBytes (8192M).

phi-annotator  | [pid: 92|app: 0|req: 290/290] 192.168.240.3 () {62 vars in 947 bytes} [Sun Aug  1 20:41:40 2021] POST /api/v1/textContactAnnotations => generated 172 bytes in 1145 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
phi-annotator  | [pid: 92|app: 0|req: 291/291] 192.168.240.3 () {62 vars in 947 bytes} [Sun Aug  1 20:41:41 2021] POST /api/v1/textContactAnnotations => generated 172 bytes in 1803 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
Exception in thread "JavaCPP Deallocator" java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (8193M) > maxPhysicalBytes (8192M)
phi-annotator  |        at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:695)
phi-annotator  |        at org.tensorflow.internal.c_api.AbstractTF_Status.newStatus(AbstractTF_Status.java:70)
phi-annotator  |        at org.tensorflow.internal.c_api.AbstractTF_Session$DeleteDeallocator.deallocate(AbstractTF_Session.java:38)
phi-annotator  |        at org.bytedeco.javacpp.Pointer$DeallocatorReference.deallocate(Pointer.java:337)
phi-annotator  |        at org.bytedeco.javacpp.Pointer$DeallocatorReference.clear(Pointer.java:331)
phi-annotator  |        at org.bytedeco.javacpp.Pointer$DeallocatorThread.run(Pointer.java:379)
[pid: 92|app: 0|req: 292/292] 192.168.240.3 () {62 vars in 947 bytes} [Sun Aug  1 20:41:43 2021] POST /api/v1/textContactAnnotations => generated 172 bytes in 3007 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
phi-annotator  | An error occurred while calling o102482.load.
phi-annotator  | : java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (8195M) > maxPhysicalBytes (8192M)
phi-annotator  |        at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:695)
phi-annotator  |        at org.tensorflow.internal.c_api.AbstractTF_ImportGraphDefOptions.newImportGraphDefOptions(AbstractTF_ImportGraphDefOptions.java:43)
phi-annotator  |        at org.tensorflow.Graph.importGraphDef(Graph.java:616)
phi-annotator  |        at org.tensorflow.Graph.importGraphDef(Graph.java:201)
phi-annotator  |        at org.tensorflow.Graph.importGraphDef(Graph.java:185)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readGraph(TensorflowWrapper.scala:370)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.unpackWithoutBundle(TensorflowWrapper.scala:297)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:426)
phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:146)phi-annotator  |        at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:121)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel$.readTensorflowModel(NerDLModel.scala:344)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.readNerGraph(NerDLModel.scala:315)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.readNerGraph$(NerDLModel.scala:314)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel$.readNerGraph(NerDLModel.scala:344)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.$anonfun$$init$$1(NerDLModel.scala:322)
phi-annotator  |        at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph.$anonfun$$init$$1$adapted(NerDLModel.scala:322)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:31)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:30)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
phi-annotator  |        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:30)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:41)
phi-annotator  |        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:41)phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:19)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
phi-annotator  |        at jdk.internal.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
phi-annotator  |        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
phi-annotator  |        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:282)
phi-annotator  |        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
phi-annotator  |        at py4j.commands.CallCommand.execute(CallCommand.java:79)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.base/java.lang.Thread.run(Thread.java:829)
tschaffter commented 3 years ago

static final long maxPhysicalBytes

Maximum amount of memory reported by physicalBytes() before forcing call to System.gc(). Set via "org.bytedeco.javacpp.maxPhysicalBytes" system property, defaults to maxBytes > 0 ? maxBytes + Runtime.maxMemory() : 0. If maxBytes is also not set, this is equivalent to a default of 2 * Runtime.maxMemory(). The value is parsed with parseBytes(String, long) where relativeMultiple = Runtime.maxMemory(). We can use a value of 0 or less to prevent any explicit call to the garbage collector.

Source

tschaffter commented 3 years ago

Setting container max memory to 4G and JAVA_TOOL_OPTIONS="-Xmx4G"

Startup logs show:

phi-annotator | Picked up JAVA_TOOL_OPTIONS: "-Xmx4G" phi-annotator | Picked up JAVA_TOOL_OPTIONS: "-Xmx4G"

The tool seems to become slower but continue to use more than 4G or memory and crash.

I confirmed that the container had only access to 4G with docker stats {container_name}. Quickly the memory used by the container reached just below 4G and was able to maintain itself always before 4G until request 200th. Then the memory used and reported gradually locked in 4G. Soon after the tool generated the following error, which no longer mention maxPhysicalBytes. Because the issue is clearly related to memory usage and based on the message displayed, it looks like it's not an issue of permission but instead the "disk" may have been full, which led to OS command to fail.

phi-annotator  | [pid: 89|app: 0|req: 234/234] 192.168.240.3 () {62 vars in 947 bytes} [Sun Aug  1 21:26:46 2021] POST /api/v1/textContactAnnotations => generated 172 bytes in 1625 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
[pid: 89|app: 0|req: 235/235] 192.168.240.3 () {62 vars in 947 bytes} [Sun Aug  1 21:26:48 2021] POST /api/v1/textContactAnnotations => generated 172 bytes in 54956 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
phi-annotator  | An error occurred while calling o82076.load.
phi-annotator  | : java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=137:
phi-annotator  |        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
phi-annotator  |        at org.apache.hadoop.util.Shell.run(Shell.java:901)
phi-annotator  |        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
phi-annotator  |        at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
phi-annotator  |        at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
phi-annotator  |        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1350)
phi-annotator  |        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNonNativeIO(RawLocalFileSystem.java:751)
phi-annotator  |        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:742)
phi-annotator  |        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:703)
phi-annotator  |        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
phi-annotator  |        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2091)
phi-annotator  |        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2071)
phi-annotator  |        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:280)
phi-annotator  |        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
phi-annotator  |        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
phi-annotator  |        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
phi-annotator  |        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
phi-annotator  |        at scala.Option.getOrElse(Option.scala:189)
phi-annotator  |        at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
phi-annotator  |        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
phi-annotator  |        at scala.Option.getOrElse(Option.scala:189)
phi-annotator  |        at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1428)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
phi-annotator  |        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
phi-annotator  |        at org.apache.spark.rdd.RDD.take(RDD.scala:1422)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1463)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
phi-annotator  |        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
phi-annotator  |        at org.apache.spark.rdd.RDD.first(RDD.scala:1463)
phi-annotator  |        at com.johnsnowlabs.nlp.serialization.StructFeature.deserializeObject(Feature.scala:120)
phi-annotator  |        at com.johnsnowlabs.nlp.serialization.Feature.deserialize(Feature.scala:47)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.$anonfun$load$1(ParamsAndFeaturesReadable.scala:15)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
phi-annotator  |        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:14)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
phi-annotator  |        at jdk.internal.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
phi-annotator  |        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
phi-annotator  |        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:282)
phi-annotator  |        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
phi-annotator  |        at py4j.commands.CallCommand.execute(CallCommand.java:79)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.base/java.lang.Thread.run(Thread.java:829)
phi-annotator  |
phi-annotator  |        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNonNativeIO(RawLocalFileSystem.java:791)
phi-annotator  |        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:742)
phi-annotator  |        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:703)
phi-annotator  |        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
phi-annotator  |        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2091)
phi-annotator  |        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2071)
phi-annotator  |        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:280)
phi-annotator  |        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
phi-annotator  |        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
phi-annotator  |        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
phi-annotator  |        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
phi-annotator  |        at scala.Option.getOrElse(Option.scala:189)
phi-annotator  |        at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
phi-annotator  |        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
phi-annotator  |        at scala.Option.getOrElse(Option.scala:189)
phi-annotator  |        at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1428)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
phi-annotator  |        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
phi-annotator  |        at org.apache.spark.rdd.RDD.take(RDD.scala:1422)
phi-annotator  |        at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1463)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
phi-annotator  |        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
phi-annotator  |        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
phi-annotator  |        at org.apache.spark.rdd.RDD.first(RDD.scala:1463)
phi-annotator  |        at com.johnsnowlabs.nlp.serialization.StructFeature.deserializeObject(Feature.scala:120)
phi-annotator  |        at com.johnsnowlabs.nlp.serialization.Feature.deserialize(Feature.scala:47)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.$anonfun$load$1(ParamsAndFeaturesReadable.scala:15)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
phi-annotator  |        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
phi-annotator  |        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:14)
phi-annotator  |        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
phi-annotator  |        at jdk.internal.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
phi-annotator  |        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
phi-annotator  |        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
phi-annotator  |        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
phi-annotator  |        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
phi-annotator  |        at py4j.Gateway.invoke(Gateway.java:282)
phi-annotator  |        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
phi-annotator  |        at py4j.commands.CallCommand.execute(CallCommand.java:79)
phi-annotator  |        at py4j.GatewayConnection.run(GatewayConnection.java:238)
phi-annotator  |        at java.base/java.lang.Thread.run(Thread.java:829)
tschaffter commented 3 years ago

Was able to annotate more than 500 notes with the second argument in JAVA_TOOL_OPTIONS=-Xmx4G -Dorg.bytedeco.javacpp.maxBytes=0. Yet memory used kept increasing so the -Xmx4G may not be global or is ignored by memory consuming processes.

EDIT: Shortly after Python crashed with a new error /o\

[pid: 91|app: 0|req: 501/501] 172.18.0.3 () {62 vars in 938 bytes} [Mon Aug  2 02:03:12 2021] POST /api/v1/textContactAnnotations => generated 172 bytes in 140652 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 0)
21/08/02 02:05:35 ERROR Executor: Exception in task 1.0 in stage 2975.0 (TID 17353)
phi-annotator  | org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
phi-annotator  |        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:550)
phi-annotator  |        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:539)
phi-annotator  |        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
phi-annotator  |        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:657)
phi-annotator  |        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:635)
phi-annotator  |        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:470)
phi-annotator  |        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
phi-annotator  |        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
phi-annotator  |        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
phi-annotator  |        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
phi-annotator  |        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
tschaffter commented 3 years ago

Initial memory used before creating Spark Session: 584 MB After creating Spark Session: 1.31 GB