`Shape of tensor EagerPyFunc [X,Y,1] is not compatible with expected shape [X,Y,3].`

rhansen123 commented 1 year ago

I am having an issue where when I try to run a training session, I get this error before anything could be done. I am pasting what the code terminal says here:

Traceback

``` 2023-10-09 15:35:16.790895: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\gui\learning\dialog.py", line 680, in view_datagen datagen.show_datagen_preview(self.labels, config_info_list) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\gui\learning\datagen.py", line 63, in show_datagen_preview results = make_datagen_results(labels_reader, cfg_info.config) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\gui\learning\datagen.py", line 160, in make_datagen_results for example in ds: File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 800, in __next__ return self._next_internal() File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 786, in _next_internal output_shapes=self._flat_output_shapes) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2844, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py", line 7107, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [600,800,3] is not compatible with expected shape [600,800,1]. [[{{node EnsureShape}}]] [Op:IteratorGetNext] Resetting monitor window. Polling: Z:/Golden_Lab_Users/Hansen_Rachael\models\TEST231009_153519.single_instance.n=21\viz\validation.*.png Start training single_instance... ['sleap-train', 'C:\\Users\\NAPE_C~1\\AppData\\Local\\Temp\\tmphifq5cmf\\231009_153520_training_job.json', 'Z:/Golden_Lab_Users/Hansen_Rachael/Test.slp', '--zmq', '--save_viz'] INFO:sleap.nn.training:Versions: SLEAP: 1.3.3 TensorFlow: 2.7.0 Numpy: 1.21.6 Python: 3.7.12 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: Z:/Golden_Lab_Users/Hansen_Rachael/Test.slp INFO:sleap.nn.training:Training profile: C:\Users\NAPE_C~1\AppData\Local\Temp\tmphifq5cmf\231009_153520_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\\Users\\NAPE_C~1\\AppData\\Local\\Temp\\tmphifq5cmf\\231009_153520_training_job.json", "labels_path": "Z:/Golden_Lab_Users/Hansen_Rachael/Test.slp", "video_paths": [ "" ], "val_labels": null, "test_labels": null, "base_checkpoint": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": "auto" } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": { "part_names": null, "sigma": 5.0, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "centroid": null, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": null }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "231009_153519.single_instance.n=21", "run_name_prefix": "TEST", "run_name_suffix": "", "runs_folder": "Z:/Golden_Lab_Users/Hansen_Rachael\\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": false, "every_epoch": false, "latest_model": true, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "C:\\Users\\NAPE_C~1\\AppData\\Local\\Temp\\tmphifq5cmf\\231009_153520_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 10634 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: Z:/Golden_Lab_Users/Hansen_Rachael/Test.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 19 / Validation = 2. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2023-10-09 15:35:26.213670: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-10-09 15:35:26.664881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 8962 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5 INFO:sleap.nn.training:Loaded test example. [2.203s] INFO:sleap.nn.training: Input shape: (608, 800, 3) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,591 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = SingleInstanceConfmapsHead(part_names=['Left Ear', 'Right Ear', 'Nose', 'Left Side', 'Right Side', 'Tail Base', 'Center'], sigma=5.0, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 304, 400, 7), dtype=tf.float32, name=None), name='SingleInstanceConfmapsHead/BiasAdd:0', description="created by layer 'SingleInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 19 INFO:sleap.nn.training:Validation set: n = 2 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: Z:/Golden_Lab_Users/Hansen_Rachael\models\TEST231009_153519.single_instance.n=21 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [2.9s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... Traceback (most recent call last): File "C:\Users\Nape_Computer_1\.conda\envs\sleap\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')()) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main trainer.train() File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 928, in train training_ds = self.training_pipeline.make_dataset() File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\nn\data\pipelines.py", line 287, in make_dataset ds = transformer.transform_dataset(ds) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\sleap\nn\data\dataset_ops.py", line 318, in transform_dataset self.examples = list(iter(ds)) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 800, in __next__ return self._next_internal() File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 786, in _next_internal output_shapes=self._flat_output_shapes) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2844, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "C:\Users\Nape_Computer_1\.conda\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py", line 7107, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [600,800,1] is not compatible with expected shape [600,800,3]. [[{{node EnsureShape}}]] [Op:IteratorGetNext] 2023-10-09 15:35:29.446978: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: Z:/Golden_Lab_Users/Hansen_Rachael\models\TEST231009_153519.single_instance.n=21 ```

I get the same issue with another computer in the lab. This is the first time we have downloaded the program, and a number of us cannot figure out what exactly is wrong. Help?

roomrys commented 1 year ago

Hi,

We have seen this error

tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [600,800,1] is not compatible with expected shape [600,800,3].

appear when there is a mix of color and grayscale videos in you project (further discussed here).

There is a "Toggle Grayscale" button in the videos tab that toggles the grayscale/color of all videos to be the same. Saving the project and rerunning training should do the trick.

Thanks, Liezl

roomrys commented 1 year ago

I'm closing this issue because I am fairly sure that was the problem (and we are tracking this in another issue #759), but please let us know if it is still unresolved.

rhansen123 commented 1 year ago

Hello Liezl,

Apologies for not responding sooner! The toggle grayscale worked. Thank you!

Best, Rachael

On Fri, Oct 13, 2023 at 9:47 AM Liezl Maree @.***> wrote:

I'm closing this issue because I am fairly sure that was the problem (and we are tracking this in another issue #759 https://urldefense.com/v3/__https://github.com/talmolab/sleap/issues/759__;!!K-Hz7m0Vt54!lbTRjQVPpKypZRnMoUcNLKQlhzIoAYRmfUnskk5_JHK5xD7RC9Kw9MQmlTZRa0X_Ty6xn2-vsSn1hLX_up-D1S0$), but please let us know if it is still unresolved.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/talmolab/sleap/issues/1538*issuecomment-1761812569__;Iw!!K-Hz7m0Vt54!lbTRjQVPpKypZRnMoUcNLKQlhzIoAYRmfUnskk5_JHK5xD7RC9Kw9MQmlTZRa0X_Ty6xn2-vsSn1hLX_gtPrtDc$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/BDE6X2443TXH3NOJJEMAG2LX7FWCNANCNFSM6AAAAAA5ZNTWAM__;!!K-Hz7m0Vt54!lbTRjQVPpKypZRnMoUcNLKQlhzIoAYRmfUnskk5_JHK5xD7RC9Kw9MQmlTZRa0X_Ty6xn2-vsSn1hLX_lbC0YbQ$ . You are receiving this because you authored the thread.Message ID: @.***>

-- Rachael Hansen University of Washington - Heshmati/Golden Lab Lab Technician @.***

talmolab / sleap

`Shape of tensor EagerPyFunc [X,Y,1] is not compatible with expected shape [X,Y,3].` #1538