talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
432 stars 96 forks source link

Error at the end of training process (centroid model) #534

Closed Xiaoyu-Tong closed 2 years ago

Xiaoyu-Tong commented 3 years ago

Hi,

When I was trying to train a new centroid model (topdown training was smooth), the training process seemed to be OK at the beginning. But then when it tried to conduct an early stopping, I got the following error (previously all early stopping in training was good): It is the first time I got this type of error that occurs after the training is done. Actually the bestmodel.h5 has been generated, so I am not sure if this error would actually affect performance or not.

Thanks for your help in advance!

The error message:

2021-03-29 14:29:13.451927: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 INFO:sleap.nn.training:Versions: SLEAP: 1.1.3 TensorFlow: 2.3.1 Numpy: 1.18.5 Python: 3.7.10 OS: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic INFO:sleap.nn.training:Training labels file: BWv0.pkg.slp INFO:sleap.nn.training:Training profile: /usr/local/lib/python3.7/dist-packages/sleap/training_profiles/baseline.centroid.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "baseline.centroid.json", "labels_path": "BWv0.pkg.slp", "video_paths": "", "val_labels": null, "test_labels": null, "tensorboard": false, "save_viz": false, "zmq": false, "run_name": "BWv0.centroid", "prefix": "", "suffix": "" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.5, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": { "anchor_part": null, "sigma": 5.0, "output_stride": 2, "offset_refinement": false }, "centered_instance": null, "multi_instance": null } }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": true }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-06, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "BWv0.centroid", "run_name_prefix": "", "run_name_suffix": null, "runs_folder": "models", "tags": [], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": false, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": false, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.1.3", "filename": "/usr/local/lib/python3.7/dist-packages/sleap/training_profiles/baseline.centroid.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:System: 2021-03-29 14:29:14.983033: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2021-03-29 14:29:14.992265: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:14.992859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s 2021-03-29 14:29:14.992915: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-03-29 14:29:14.995437: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2021-03-29 14:29:14.997403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-03-29 14:29:14.998038: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-03-29 14:29:15.000446: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-03-29 14:29:15.001494: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2021-03-29 14:29:15.005884: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-03-29 14:29:15.006017: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.006621: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.007136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: BWv0.pkg.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 360 / Validation = 40. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2021-03-29 14:29:15.114610: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-03-29 14:29:15.119018: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2000140000 Hz 2021-03-29 14:29:15.119215: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5590d8040300 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-03-29 14:29:15.119244: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-03-29 14:29:15.205080: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.205897: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5590d8041100 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-03-29 14:29:15.205931: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2021-03-29 14:29:15.206152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.206721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s 2021-03-29 14:29:15.206816: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-03-29 14:29:15.206860: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2021-03-29 14:29:15.206896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-03-29 14:29:15.206928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-03-29 14:29:15.206955: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-03-29 14:29:15.206984: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2021-03-29 14:29:15.207015: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-03-29 14:29:15.207102: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.207718: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.208223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2021-03-29 14:29:15.208304: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-03-29 14:29:15.697413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-03-29 14:29:15.697473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2021-03-29 14:29:15.697489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2021-03-29 14:29:15.697738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.698416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-29 14:29:15.699006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14338 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0) INFO:sleap.nn.training:Loaded test example. [2.186s] INFO:sleap.nn.training: Input shape: (256, 336, 3) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,393 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part=None, sigma=5.0, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = Tensor("CentroidConfmapsHead_0/BiasAdd:0", shape=(None, 128, 168, 1), dtype=float32) INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 360 INFO:sleap.nn.training:Validation set: n = 40 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-06, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.training:Created run path: models/BWv0.centroid INFO:sleap.nn.training:Setting up visualization... Unable to use Qt backend for matplotlib. This probably means Qt is running headless. INFO:sleap.nn.training:Finished trainer set up. [5.7s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [6.5s] INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2021-03-29 14:29:28.750137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-03-29 14:29:30.021905: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 200/200 - 11s - loss: 0.0028 - val_loss: 0.0024 Epoch 2/200 200/200 - 11s - loss: 0.0018 - val_loss: 0.0022 Epoch 3/200 200/200 - 12s - loss: 0.0014 - val_loss: 0.0015 Epoch 4/200 200/200 - 11s - loss: 0.0014 - val_loss: 0.0016 Epoch 5/200 200/200 - 12s - loss: 0.0013 - val_loss: 0.0015 Epoch 6/200 200/200 - 10s - loss: 0.0013 - val_loss: 0.0017 Epoch 7/200 200/200 - 10s - loss: 0.0012 - val_loss: 0.0015 Epoch 8/200 200/200 - 11s - loss: 0.0012 - val_loss: 0.0015 Epoch 9/200 200/200 - 10s - loss: 0.0012 - val_loss: 0.0015 Epoch 10/200 200/200 - 10s - loss: 0.0011 - val_loss: 0.0016 Epoch 11/200 200/200 - 12s - loss: 0.0011 - val_loss: 0.0014 Epoch 12/200 200/200 - 22s - loss: 0.0011 - val_loss: 0.0014 Epoch 13/200 200/200 - 10s - loss: 0.0010 - val_loss: 0.0014 Epoch 14/200 200/200 - 10s - loss: 0.0011 - val_loss: 0.0015 Epoch 15/200 200/200 - 10s - loss: 0.0011 - val_loss: 0.0015 Epoch 16/200 200/200 - 10s - loss: 0.0010 - val_loss: 0.0014 Epoch 17/200

Epoch 00017: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05. 200/200 - 11s - loss: 0.0011 - val_loss: 0.0014 Epoch 18/200 200/200 - 11s - loss: 9.5435e-04 - val_loss: 0.0013 Epoch 19/200 200/200 - 11s - loss: 9.6762e-04 - val_loss: 0.0013 Epoch 20/200 200/200 - 10s - loss: 9.5586e-04 - val_loss: 0.0014 Epoch 21/200 200/200 - 10s - loss: 9.3800e-04 - val_loss: 0.0013 Epoch 22/200 200/200 - 10s - loss: 9.4187e-04 - val_loss: 0.0013 Epoch 23/200 200/200 - 10s - loss: 9.2374e-04 - val_loss: 0.0013 Epoch 24/200

Epoch 00024: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05. 200/200 - 10s - loss: 9.5109e-04 - val_loss: 0.0014 Epoch 25/200 200/200 - 10s - loss: 9.1295e-04 - val_loss: 0.0013 Epoch 26/200 200/200 - 11s - loss: 9.0279e-04 - val_loss: 0.0012 Epoch 27/200 200/200 - 10s - loss: 9.0027e-04 - val_loss: 0.0013 Epoch 28/200 200/200 - 10s - loss: 9.0850e-04 - val_loss: 0.0012 Epoch 29/200 200/200 - 10s - loss: 8.7100e-04 - val_loss: 0.0012 Epoch 30/200 200/200 - 10s - loss: 8.8868e-04 - val_loss: 0.0013 Epoch 31/200

Epoch 00031: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05. 200/200 - 11s - loss: 8.7444e-04 - val_loss: 0.0012 Epoch 32/200 200/200 - 10s - loss: 8.6447e-04 - val_loss: 0.0013 Epoch 33/200 200/200 - 10s - loss: 8.7328e-04 - val_loss: 0.0013 Epoch 34/200 200/200 - 11s - loss: 8.5597e-04 - val_loss: 0.0012 Epoch 35/200 200/200 - 10s - loss: 8.4266e-04 - val_loss: 0.0012 Epoch 36/200 200/200 - 12s - loss: 8.5793e-04 - val_loss: 0.0011 Epoch 37/200 200/200 - 10s - loss: 8.5152e-04 - val_loss: 0.0012 Epoch 38/200 200/200 - 10s - loss: 8.3114e-04 - val_loss: 0.0013 Epoch 39/200 200/200 - 10s - loss: 8.2601e-04 - val_loss: 0.0012 Epoch 40/200 200/200 - 10s - loss: 8.7106e-04 - val_loss: 0.0012 Epoch 41/200

Epoch 00041: ReduceLROnPlateau reducing learning rate to 6.24999984211172e-06. 200/200 - 11s - loss: 8.2656e-04 - val_loss: 0.0012 Epoch 42/200 200/200 - 10s - loss: 8.2745e-04 - val_loss: 0.0013 Epoch 43/200 200/200 - 10s - loss: 8.3784e-04 - val_loss: 0.0012 Epoch 44/200 200/200 - 10s - loss: 8.4922e-04 - val_loss: 0.0012 Epoch 45/200 200/200 - 10s - loss: 8.1336e-04 - val_loss: 0.0012 Epoch 46/200 200/200 - 11s - loss: 8.2872e-04 - val_loss: 0.0013 Epoch 00046: early stopping INFO:sleap.nn.training:Finished training loop. [8.4 min] INFO:sleap.nn.training:Deleting visualization directory: models/BWv0.centroid/viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9% ETA: 0:00:15 22.1 FPS Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1296, in truediv return _truediv_python3(x, y, name) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1221, in _truediv_python3 x = ops.convert_to_tensor(x, name="x") File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 1499, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py", line 338, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py", line 264, in constant allow_broadcast=True) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl return _constant_eager_impl(ctx, value, dtype, shape, verify_shape) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl t = convert_to_eager_tensor(value, ctx, dtype) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor return ops.EagerTensor(value, ctx.device_name, dtype) ValueError: TypeError: object of type 'RaggedTensor' has no len()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/sleap-train", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/sleap/nn/training.py", line 1582, in main trainer.train() File "/usr/local/lib/python3.7/dist-packages/sleap/nn/training.py", line 904, in train self.evaluate() File "/usr/local/lib/python3.7/dist-packages/sleap/nn/training.py", line 917, in evaluate split_name="train", File "/usr/local/lib/python3.7/dist-packages/sleap/nn/evals.py", line 699, in evaluate_model labels_pr = predictor.predict(labels_reader, make_labels=True) File "/usr/local/lib/python3.7/dist-packages/sleap/nn/inference.py", line 390, in predict self._make_labeled_frames_from_generator(generator, data) File "/usr/local/lib/python3.7/dist-packages/sleap/nn/inference.py", line 2010, in _make_labeled_frames_from_generator for ex in generator: File "/usr/local/lib/python3.7/dist-packages/sleap/nn/inference.py", line 300, in _predict_generator ex = process_batch(ex) File "/usr/local/lib/python3.7/dist-packages/sleap/nn/inference.py", line 279, in process_batch np.expand_dims(ex["scale"], axis=1), axis=1 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 205, in wrapper result = dispatch(wrapper, args, kwargs) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 118, in dispatch result = dispatcher.handle(args, kwargs) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_dispatch.py", line 219, in handle ragged_tensor_shape.RaggedTensorDynamicShape.from_tensor(y)) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_tensor_shape.py", line 470, in broadcast_dynamic_shape shape_x = shape_x.broadcast_dimension(axis, shape_y.dimension_size(axis)) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_tensor_shape.py", line 351, in broadcast_dimension condition, data=broadcast_err, summarize=10) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, *kwargs) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 247, in wrapped return _add_should_use_warning(fn(args, **kwargs), File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 158, in Assert (condition, "\n".join(data_str))) tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'Unable to broadcast: dimension size mismatch in dimension' 0 b'lengths=' 4 b'dim_size=' 2

talmo commented 3 years ago

Yeah weird bug, but shouldn't affect anything. I'll leave it open until we get a chance to look into the root cause, but for now you can just ignore it and use the trained model as-is.

Xiaoyu-Tong commented 3 years ago

OK Thanks!

talmo commented 2 years ago

Closing this issue due to inactivity but please feel free to comment again if you're still having issues and we'll reopen it.