intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
17 stars 4 forks source link

submit-examples-on-k8s: autograd examples - Exception: dimension exceeds input dimensionsdimension 1, input dimension 0 #266

Open zzti-bsj opened 3 years ago

zzti-bsj commented 3 years ago

Exception: dimension exceeds input dimensionsdimension 1, input dimension 0

https://github.com/intel-analytics/analytics-zoo/blob/master/docker/hyperzoo/submit-examples-on-k8s.md

when I execute python examples, two examples included autograd custom and autograd customloss throw the same exception

autograd custom command

/opt/spark/bin/spark-submit
   --master k8s://https://127.0.0.1:8443
   --deploy-mode cluster
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
   --name analytics-zoo
   --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE}
   --conf spark.executor.instances=1
   --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim
   --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/zoo
   --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim
   --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/zoo
   --conf spark.kubernetes.driver.label.az=true
   --conf spark.kubernetes.executor.label.az=true
   --conf spark.kubernetes.node.selector.spark=true
   --executor-cores 16
   --executor-memory 100g
   --total-executor-cores 64
   --driver-cores 4
   --driver-memory 20g
   --properties-file /opt/analytics-zoo-0.11.0-SNAPSHOT/conf/spark-analytics-zoo.conf
   --py-files /opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip,/opt/analytics-zoo-examples/python/autograd/custom.py
   --conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp
   --conf spark.sql.catalogImplementation='in-memory'
   --conf spark.driver.extraClassPath=/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-jar-with-dependencies.jar
   --conf spark.executor.extraClassPath=/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-jar-with-dependencies.jar   file:///opt/analytics-zoo-examples/python/autograd/custom.py
   --nb_epoch 2

autograd customloss command

/opt/spark/bin/spark-submit
   --master k8s://https://127.0.0.1:8443
   --deploy-mode cluster
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
   --name analytics-zoo
   --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE}
   --conf spark.executor.instances=1
   --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim
   --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/zoo
   --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim
   --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/zoo
   --conf spark.kubernetes.driver.label.az=true
   --conf spark.kubernetes.executor.label.az=true
   --conf spark.kubernetes.node.selector.spark=true
   --executor-cores 16
   --executor-memory 100g
   --total-executor-cores 64
   --driver-cores 4
   --driver-memory 20g
   --properties-file /opt/analytics-zoo-0.11.0-SNAPSHOT/conf/spark-analytics-zoo.conf
   --py-files /opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip,/opt/analytics-zoo-examples/python/autograd/customloss.py
   --conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp
   --conf spark.sql.catalogImplementation='in-memory'
   --conf spark.driver.extraClassPath=/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-jar-with-dependencies.jar
   --conf spark.executor.extraClassPath=/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-jar-with-dependencies.jar   file:///opt/analytics-zoo-examples/python/autograd/customloss.py

Exception info

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.30.100.10 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner file:///opt/analytics-zoo-examples/python/autograd/custom.py --nb_epoch 2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2021-05-19 07:57:37 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
pyspark_submit_args is:  --driver-class-path /opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-jar-with-dependencies.jar pyspark-shell
2021-05-19 07:57:38 INFO  SparkContext:54 - Running Spark version 2.4.3
2021-05-19 07:57:38 INFO  SparkContext:54 - Submitted application: custom example
2021-05-19 07:57:39 INFO  SecurityManager:54 - Changing view acls to: root
2021-05-19 07:57:39 INFO  SecurityManager:54 - Changing modify acls to: root
2021-05-19 07:57:39 INFO  SecurityManager:54 - Changing view acls groups to:
2021-05-19 07:57:39 INFO  SecurityManager:54 - Changing modify acls groups to:
2021-05-19 07:57:39 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2021-05-19 07:57:39 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 7078.
2021-05-19 07:57:39 INFO  SparkEnv:54 - Registering MapOutputTracker
2021-05-19 07:57:39 INFO  SparkEnv:54 - Registering BlockManagerMaster
2021-05-19 07:57:39 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2021-05-19 07:57:39 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2021-05-19 07:57:39 INFO  DiskBlockManager:54 - Created local directory at /var/data/spark-0634ed74-1bc9-4819-9b3d-05478f9b4080/blockmgr-a76abb88-3b7f-4118-9dd5-ebad16a9dfa2
2021-05-19 07:57:39 INFO  MemoryStore:54 - MemoryStore started with capacity 10.5 GB
2021-05-19 07:57:39 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2021-05-19 07:57:39 INFO  log:192 - Logging initialized @3094ms
2021-05-19 07:57:39 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2021-05-19 07:57:39 INFO  Server:419 - Started @3204ms
2021-05-19 07:57:39 INFO  AbstractConnector:278 - Started ServerConnector@7055c2dc{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2021-05-19 07:57:39 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@29dbffdd{/jobs,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3587db4e{/jobs/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@f0ebf9b{/jobs/job,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6d986a19{/jobs/job/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@664e914f{/stages,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1c46187a{/stages/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@19543673{/stages/stage,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@9cd3dbc{/stages/stage/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31126558{/stages/pool,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@14fa61c7{/stages/pool/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@123d568e{/storage,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@72fdb9a8{/storage/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50e72a0a{/storage/rdd,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@17d48d88{/storage/rdd/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6f2530da{/environment,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@708ae46f{/environment/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1954270e{/executors,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2adf5161{/executors/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6cba1700{/executors/threadDump,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b343f64{/executors/threadDump/json,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@73e01d5b{/static,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ba79c5c{/,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@30e28f59{/api,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3de5b3b7{/jobs/job/kill,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@303dc698{/stages/stage/kill,null,AVAILABLE,@Spark}
2021-05-19 07:57:39 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://analytics-zoo-1621411071291-driver-svc.default.svc:4040
2021-05-19 07:57:39 INFO  SparkContext:54 - Added file file:///opt/analytics-zoo-examples/python/autograd/custom.py at spark://analytics-zoo-1621411071291-driver-svc.default.svc:7078/files/custom.py with timestamp 1621411059796
2021-05-19 07:57:39 INFO  Utils:54 - Copying /opt/analytics-zoo-examples/python/autograd/custom.py to /var/data/spark-0634ed74-1bc9-4819-9b3d-05478f9b4080/spark-fdcf92c9-29d1-4f43-b386-0e2d52a2e685/userFiles-7148df08-5338-4c0e-a3e3-5b3aeddb8fff/custom.py
2021-05-19 07:57:39 INFO  SparkContext:54 - Added file file:///opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip at spark://analytics-zoo-1621411071291-driver-svc.default.svc:7078/files/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip with timestamp 1621411059811
2021-05-19 07:57:39 INFO  Utils:54 - Copying /opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip to /var/data/spark-0634ed74-1bc9-4819-9b3d-05478f9b4080/spark-fdcf92c9-29d1-4f43-b386-0e2d52a2e685/userFiles-7148df08-5338-4c0e-a3e3-5b3aeddb8fff/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip
2021-05-19 07:57:39 WARN  SparkContext:66 - The path file:///opt/analytics-zoo-examples/python/autograd/custom.py has been added already. Overwriting of added paths is not supported in the current version.
2021-05-19 07:57:40 INFO  ExecutorPodsAllocator:54 - Going to request 1 executors from Kubernetes.
2021-05-19 07:57:41 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
2021-05-19 07:57:41 INFO  NettyBlockTransferService:54 - Server created on analytics-zoo-1621411071291-driver-svc.default.svc:7079
2021-05-19 07:57:41 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2021-05-19 07:57:41 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, analytics-zoo-1621411071291-driver-svc.default.svc, 7079, None)
2021-05-19 07:57:41 INFO  BlockManagerMasterEndpoint:54 - Registering block manager analytics-zoo-1621411071291-driver-svc.default.svc:7079 with 10.5 GB RAM, BlockManagerId(driver, analytics-zoo-1621411071291-driver-svc.default.svc, 7079, None)
2021-05-19 07:57:41 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, analytics-zoo-1621411071291-driver-svc.default.svc, 7079, None)
2021-05-19 07:57:41 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, analytics-zoo-1621411071291-driver-svc.default.svc, 7079, None)
2021-05-19 07:57:41 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2687ceb5{/metrics/json,null,AVAILABLE,@Spark}
2021-05-19 08:02:02 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.30.14.4:49728) with ID 1
2021-05-19 08:02:02 INFO  KubernetesClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 1.0
cls.getname: com.intel.analytics.bigdl.python.api.Sample
BigDLBasePickler registering: bigdl.util.common  Sample
cls.getname: com.intel.analytics.bigdl.python.api.EvaluatedResult
BigDLBasePickler registering: bigdl.util.common  EvaluatedResult
cls.getname: com.intel.analytics.bigdl.python.api.JTensor
BigDLBasePickler registering: bigdl.util.common  JTensor
cls.getname: com.intel.analytics.bigdl.python.api.JActivity
BigDLBasePickler registering: bigdl.util.common  JActivity
creating: createZooKerasInput
creating: createZooKerasDense
creating: createZooKerasVariable
creating: createZooKerasLambdaLayer
creating: createZooKerasModel
creating: createDefault
creating: createSGD
creating: createZooKerasVariable
creating: createZooKerasVariable
creating: createZooKerasCustomLoss
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 1 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 2 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 3 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 4 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 5 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 6 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 7 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 8 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 9 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 10 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 11 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 12 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 13 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 14 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 15 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:69 - Clone 16 model...
2021-05-19 08:02:05 INFO  LocalOptimizer$:119 - model thread pool size is 1
2021-05-19 08:02:05 ERROR ThreadPool$:136 - Error: Layer info: Model[a3699fac]/KerasLayerWrapper[Meanb402f032_wrapper]
java.lang.IllegalArgumentException: requirement failed: dimension exceeds input dimensionsdimension 1, input dimension 0
        at scala.Predef$.require(Predef.scala:224)
        at com.intel.analytics.bigdl.nn.Sum.getPositiveDimension(Sum.scala:64)
        at com.intel.analytics.bigdl.nn.Sum.updateOutput(Sum.scala:75)
        at com.intel.analytics.bigdl.nn.Sum.updateOutput(Sum.scala:44)
        at com.intel.analytics.bigdl.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
        at com.intel.analytics.bigdl.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
        at com.intel.analytics.bigdl.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply$mcD$sp(LocalOptimizer.scala:149)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$invokeAndWait$1$$anonfun$apply$3.apply(ThreadPool.scala:133)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:288)
        at com.intel.analytics.bigdl.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
        at com.intel.analytics.bigdl.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply$mcD$sp(LocalOptimizer.scala:149)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$invokeAndWait$1$$anonfun$apply$3.apply(ThreadPool.scala:133)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last):
  File "/opt/analytics-zoo-examples/python/autograd/custom.py", line 59, in <module>
    distributed=False)
  File "/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip/zoo/pipeline/api/keras/engine/topology.py", line 253, in fit
  File "/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip/zoo/common/utils.py", line 135, in callZooFunc
  File "/opt/analytics-zoo-0.11.0-SNAPSHOT/lib/analytics-zoo-bigdl_0.12.2-spark_2.4.3-0.11.0-SNAPSHOT-python-api.zip/zoo/common/utils.py", line 129, in callZooFunc
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o75.zooFit.
: Layer info: Model[a3699fac]/KerasLayerWrapper[Meanb402f032_wrapper]
java.lang.IllegalArgumentException: requirement failed: dimension exceeds input dimensionsdimension 1, input dimension 0
        at scala.Predef$.require(Predef.scala:224)
        at com.intel.analytics.bigdl.nn.Sum.getPositiveDimension(Sum.scala:64)
        at com.intel.analytics.bigdl.nn.Sum.updateOutput(Sum.scala:75)
        at com.intel.analytics.bigdl.nn.Sum.updateOutput(Sum.scala:44)
        at com.intel.analytics.bigdl.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
        at com.intel.analytics.bigdl.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
        at com.intel.analytics.bigdl.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply$mcD$sp(LocalOptimizer.scala:149)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$invokeAndWait$1$$anonfun$apply$3.apply(ThreadPool.scala:133)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:288)
        at com.intel.analytics.bigdl.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
        at com.intel.analytics.bigdl.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
        at com.intel.analytics.zoo.pipeline.api.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
        at com.intel.analytics.bigdl.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply$mcD$sp(LocalOptimizer.scala:149)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.optim.LocalOptimizer$$anonfun$7$$anonfun$apply$1.apply(LocalOptimizer.scala:141)
        at com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$invokeAndWait$1$$anonfun$apply$3.apply(ThreadPool.scala:133)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Le-Zheng commented 3 years ago

This example tests LocalOptimizer https://github.com/intel-analytics/analytics-zoo/blob/master/pyzoo/zoo/examples/autograd/custom.py#L59 distributed=False). It may need to add param to pass --distributed True