Closed zjdx1998 closed 5 years ago
Hi @zjdx1998 I think the error message that makes more sense is the last one
Caused by: java.lang.AssertionError: assertion failed: curTarget 14 is out of range 1 to 5
at scala.Predef$.assert(Predef.scala:170)
at com.intel.analytics.bigdl.nn.ClassNLLCriterion$$anonfun$updateOutput$5.apply(ClassNLLCriterion.scala:132)
at com.intel.analytics.bigdl.nn.ClassNLLCriterion$$anonfun$updateOutput$5.apply(ClassNLLCriterion.scala:130)
at com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$invoke$2.apply(ThreadPool.scala:194)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
... 18 more
In our example on movenlens dataset, the ratings (i.e. label) range from 1 to 5. Seems now you have label 14 but when you create the model, you are still specifying the number of classes to be 5. Could you please have a check?
Thanks @hkvision And by the way, I would like to know how to improve driver.memory on google colab, because I am now experiencing a new error, which I tried the following solutions but not worked:
1.
memory = '32g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
2.
!export SPARK_DRIVER_MEMORY = 32g
3.
sc.stop()
_tconf = sc.getConf()
_tconf.set('spark.driver.memory', '32g')
sc = init_nncontext(_tconf) #this report errors that exist two SparkContext
Could you please help me about this? Thanks very much!
This is the error info:
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:34893)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 1067, in start
self.socket.connect((self.address, self.port))
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Py4JNetworkErrorTraceback (most recent call last)
<ipython-input-59-0a9cdafefb39> in <module>()
----> 1 get_ipython().run_cell_magic(u'time', u'', u'# Boot training process\nwide_n_deep.fit(train_data,\n batch_size = 294,\n nb_epoch = 8,\n validation_data = test_data\n )\nprint("Optimization Done.")')
5 frames
</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-60> in time(self, line, cell, local_ns)
<timed exec> in <module>()
/usr/local/lib/python2.7/dist-packages/bigdl/util/common.pyc in callBigDlFunc(bigdl_type, name, *args)
587 error = e
588 if "does not exist" not in str(e):
--> 589 raise e
590 else:
591 return result
Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:34893)
And the bigdl.log :
*********
2019-08-19 02:42:10 INFO DistriOptimizer$:181 - [Epoch 7 588/295][Iteration 14][Wall Clock 4.787957025s] Top1Accuracy is Accuracy(correct: 12, count: 103, accuracy: 0.11650485436893204)
2019-08-19 02:42:10 INFO DistriOptimizer$:408 - [Epoch 8 294/295][Iteration 15][Wall Clock 5.095077726s] Trained 294 records in 0.256534343 seconds. Throughput is 1146.0454 records/second. Loss is 16.97959.
2019-08-19 02:42:10 INFO DistriOptimizer$:408 - [Epoch 8 294/295][Iteration 15][Wall Clock 5.095077726s] Trained 294 records in 0.256534343 seconds. Throughput is 1146.0454 records/second. Loss is 16.97959.
2019-08-19 02:42:11 INFO DistriOptimizer$:408 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Trained 294 records in 0.24451292 seconds. Throughput is 1202.3905 records/second. Loss is 16.97959.
2019-08-19 02:42:11 INFO DistriOptimizer$:408 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Trained 294 records in 0.24451292 seconds. Throughput is 1202.3905 records/second. Loss is 16.97959.
2019-08-19 02:42:11 INFO DistriOptimizer$:452 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Epoch finished. Wall clock time is 5391.449468 ms
2019-08-19 02:42:11 INFO DistriOptimizer$:452 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Epoch finished. Wall clock time is 5391.449468 ms
2019-08-19 02:42:11 INFO DistriOptimizer$:111 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Validate model...
2019-08-19 02:42:11 INFO DistriOptimizer$:111 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Validate model...
2019-08-19 02:42:11 INFO DistriOptimizer$:178 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] validate model throughput is 48.515327 records/second
2019-08-19 02:42:11 INFO DistriOptimizer$:178 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] validate model throughput is 48.515327 records/second
2019-08-19 02:42:11 INFO DistriOptimizer$:181 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Loss is (Loss: 32.541798, count: 2, Average Loss: 16.270899)
2019-08-19 02:42:11 INFO DistriOptimizer$:181 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Loss is (Loss: 32.541798, count: 2, Average Loss: 16.270899)
2019-08-19 02:42:11 INFO DistriOptimizer$:181 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Top1Accuracy is Accuracy(correct: 12, count: 103, accuracy: 0.11650485436893204)
2019-08-19 02:42:11 INFO DistriOptimizer$:181 - [Epoch 8 588/295][Iteration 16][Wall Clock 5.339590646s] Top1Accuracy is Accuracy(correct: 12, count: 103, accuracy: 0.11650485436893204)
2019-08-19 02:42:12 ERROR Executor:91 - Exception in task 0.0 in stage 149.0 (TID 59)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:260)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-19 02:42:12 ERROR Executor:91 - Exception in task 0.0 in stage 149.0 (TID 59)
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:260)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-19 02:42:12 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 59,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:260)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-19 02:42:12 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 59,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:260)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-19 02:42:12 WARN TaskSetManager:66 - Lost task 0.0 in stage 149.0 (TID 59, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:260)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-19 02:42:12 WARN TaskSetManager:66 - Lost task 0.0 in stage 149.0 (TID 59, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:260)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-19 02:42:12 ERROR TaskSetManager:70 - Task 0 in stage 149.0 failed 1 times; aborting job
2019-08-19 02:42:12 ERROR TaskSetManager:70 - Task 0 in stage 149.0 failed 1 times; aborting job
Actually, I set the spark config like this:
sc.getConf().getAll()
sc.stop()
_conf = sc.getConf().set('spark.driver.memory','32g')
_conf.set('spark.executor.memory','32g')
_conf.set('spark.driver.maxResultSize','32g')
_conf.getAll()
sc = init_nncontext(conf=_conf)
sc.getConf().getAll()
And got the new config:
[(u'spark.executorEnv.OMP_NUM_THREADS', u'1'),
(u'spark.serializer', u'org.apache.spark.serializer.JavaSerializer'),
(u'spark.driver.memory', u'32g'),
(u'spark.driver.port', u'33559'),
(u'spark.driver.maxResultSize', u'32g'),
(u'spark.shuffle.reduceLocality.enabled', u'false'),
(u'spark.executor.id', u'driver'),
(u'spark.shuffle.blockTransferService', u'nio'),
(u'spark.executorEnv.KMP_BLOCKTIME', u'0'),
(u'spark.driver.extraClassPath',
u'/usr/local/lib/python2.7/dist-packages/bigdl/share/lib/bigdl-0.8.0-jar-with-dependencies.jar:/usr/local/lib/python2.7/dist-packages/zoo/share/lib/analytics-zoo-bigdl_0.8.0-spark_2.4.3-0.5.1-jar-with-dependencies.jar'),
(u'spark.executorEnv.KMP_AFFINITY', u'granularity=fine,compact,1,0'),
(u'spark.app.name', u'WideAndDeep JobRecommendation'),
(u'spark.executor.memory', u'32g'),
(u'spark.app.id', u'local-1566201703722'),
(u'spark.driver.host', u'11a156303bf1'),
(u'spark.rdd.compress', u'True'),
(u'spark.speculation', u'false'),
(u'spark.serializer.objectStreamReset', u'100'),
(u'spark.master', u'local[*]'),
(u'spark.scheduler.minRegisteredResourcesRatio', u'1.0'),
(u'spark.submit.deployMode', u'client'),
(u'spark.ui.showConsoleProgress', u'true'),
(u'spark.executorEnv.KMP_SETTINGS', u'1')]
But it still didn't work in the last step : wide_n_deep.recommend_for_user()
which reports the errors in my last comment, and I check the jvm heap size and get this:
Attaching to process ID 453, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.212-b03
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1073741824 (1024.0MB) ***watch out here***
NewSize = 138412032 (132.0MB)
MaxNewSize = 357564416 (341.0MB)
OldSize = 276824064 (264.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.tools.jmap.JMap.runTool(JMap.java:201)
at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: java.lang.RuntimeException: unknown CollectedHeap type : class sun.jvm.hotspot.gc_interface.CollectedHeap
at sun.jvm.hotspot.tools.HeapSummary.run(HeapSummary.java:144)
at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49)
... 6 more
It frustrated me.
So you have successfully trained the model, and the error only happens when you call recommend_for_user
?
So you have successfully trained the model, and the error only happens when you call
recommend_for_user
?
Yes, but when I changed the value of spark.driver|executor.memory|maxResultSize
, training process failed again.
I maintained the config in the last comment and successfully trained the model, but failed when I call recommend_for_user
Thanks for your replying!
So you have successfully trained the model, and the error only happens when you call
recommend_for_user
?Yes, but when I changed the value of
spark.driver|executor.memory|maxResultSize
, training process failed again. I maintained the config in the last comment and successfully trained the model, but failed when I callrecommend_for_user
Thanks for your replying!
change the value of spark.driver.memory
you mean expanding the memory but the training process failed?
I noticed from your error that you only have 200+ training data? What about your test data that fed into recommend_for_user
? (Maybe you can further reduce the data size to investigate what happens?)
Also, at the same time, can you run our notebook on WideAndDeep here: https://github.com/intel-analytics/analytics-zoo/tree/master/apps/recommendation-wide-n-deep ?
So you have successfully trained the model, and the error only happens when you call
recommend_for_user
?Yes, but when I changed the value of
spark.driver|executor.memory|maxResultSize
, training process failed again. I maintained the config in the last comment and successfully trained the model, but failed when I callrecommend_for_user
Thanks for your replying!
change the value of spark.driver.memory
you mean expanding the memory but the training process failed? I noticed from your error that you only have 200+ training data? What about your test data that fed intorecommend_for_user
? (Maybe you can further reduce the data size to investigate what happens?)Also, at the same time, can you run our notebook on WideAndDeep here: https://github.com/intel-analytics/analytics-zoo/tree/master/apps/recommendation-wide-n-deep ?
Thanks for reply
one is
java.lang.OutOfMemoryError: Java heap space
another is
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
!export SPARK_DRIVER_MEMORY=32g
) but can't run it in Google Colab(with set config of spar.driver.memory = 32g)So you have successfully trained the model, and the error only happens when you call
recommend_for_user
?Yes, but when I changed the value of
spark.driver|executor.memory|maxResultSize
, training process failed again. I maintained the config in the last comment and successfully trained the model, but failed when I callrecommend_for_user
Thanks for your replying!
change the value of spark.driver.memory
you mean expanding the memory but the training process failed? I noticed from your error that you only have 200+ training data? What about your test data that fed intorecommend_for_user
? (Maybe you can further reduce the data size to investigate what happens?) Also, at the same time, can you run our notebook on WideAndDeep here: https://github.com/intel-analytics/analytics-zoo/tree/master/apps/recommendation-wide-n-deep ?Thanks for reply
- Yes, I expand the memory to 40g but trained failed.
- I used to have more than 100,000 data in the past, but now I only choose to keep 200 for testing. According to your suggestion, I reduced the amount of data to 150 and ran twice, but still fail twice
one is
java.lang.OutOfMemoryError: Java heap space
another isTraceback (most recent call last): File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock ERROR:root:Exception while sending command. Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 985, in send_command response = connection.send_command(command) File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 1164, in send_command "Error while receiving", e, proto.ERROR_ON_RECEIVE) Py4JNetworkError: Error while receiving
- I can run the example notebook in my computer (with
!export SPARK_DRIVER_MEMORY=32g
) but can't run it in Google Colab(with set config of spar.driver.memory = 32g)
Then we suppose there is something wrong with Google Colab configurations? Could you check the memory it actually allocates for you? Or you can seek them for some help?
it seems you can check the memory here
I will first close this issue. Feel free to reopen it if there are further problems. @zjdx1998
I changed the data in the example and modified the
column_info
. The other settings are basically the same as the Wide-And-Deep example. Why isBoxed Error
appearing?The basic environment version follows:
And I run the code in Google Colab. When training the model below, the error threw:
Here paste log: