CUDA_VISIBLE_DEVICES=5,6,7,8 singularity exec --nv -B /home /home/software/singularity/base.simg:latest python3 mlpf/launcher.py --model-spec parameters/cms-gnn-skipconn-v2.yaml --action train
...
Traceback (most recent call last):
File "mlpf/launcher.py", line 26, in <module>
main(args, yaml_path, config)
File "/home/joosep/particleflow/mlpf/tfmodel/model_setup.py", line 755, in main
fit_result = model.fit(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2942, in __call__
return graph_function._call_flat(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 5 root error(s) found.
(0) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[replica_2/pf_net/sparse_hashed_nn_distance/map/while/body/_1664/replica_2/pf_net/sparse_hashed_nn_distance/map/while/map/while/cond/_8452/replica_2/pf_net/sparse_hashed_nn_distance/map/while/map/while/Less_1/_823]]
(1) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[replica_1/pf_net/sparse_hashed_nn_distance/SparseTensor/dense_shape/_864]]
(2) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[replica_1/pf_net/sparse_hashed_nn_distance/SparseTensor/dense_shape/_863]]
(3) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
[[pf_net/gnn_reg/StatefulPartitionedCall/conv_reg1/map/while/body/_3801/conv_reg1/map/while/SparseReshape/_2974]]
(4) Invalid argument: Tried to stack elements of an empty list with non-fully-defined element_shape: [?,512]
[[{{node replica_3/pf_net/gnn_id/StatefulPartitionedCall/conv_id0/map/TensorArrayV2Stack/TensorListStack}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_335632]
Function call stack:
train_function -> train_function -> train_function -> train_function -> train_function
This works fine:
and so does
while this doesn't :