Open BenjaminBo opened 1 week ago
Hi!
I think I'm having a similar issue(?).
I wanted to train a top-down model for a multianimal project. Training the centroid model worked with no problem, but then I also seem to have this layer shape issue in the case of the centered model:
Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 64, 64, 3), found shape=(1, 8, 8, 3)
(I also worked with autocrop, tried changing that, scaling the input and it still didn't work.)
Software versions:
SLEAP: 1.3.4
TensorFlow: 2.7.0
Numpy: 1.21.6
Python: 3.7.12
OS: Windows-10-10.0.22621-SP0
...
INFO:sleap.nn.training:Finished trainer set up. [3.2s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [6.2s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/200
Traceback (most recent call last):
File "C:\anaconda\envs\sleap\Scripts\sleap-train-script.py", line 33, in <module>
sys.exit(load_entry_point('sleap==1.3.4', 'console_scripts', 'sleap-train')())
File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main
trainer.train()
File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 941, in train
verbose=2,
File "C:\anaconda\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\callbacks.py", line 280, in on_epoch_end
figure = self.plot_fn()
File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1346, in <lambda>
viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1326, in visualize_example
preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
File "C:\anaconda\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2088, in call
out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).
Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 64, 64, 3), found shape=(1, 8, 8, 3)
Call arguments received:
• inputs=tf.Tensor(shape=(1, 64, 64, 3), dtype=float32)
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
INFO:sleap.nn.callbacks:Closing the training controller socket/context.
Hi @BenjaminBo,
We are working to fix this. In the mean time, what happens when you only use input scaling for the centroid model?
""" I want to scale the input data down. I do not want to crop it (which works with an input_scale of 1.0). """
The centroid model will always result in cropping around the animal instance. Are the features on your animal very small compared to the animal itself? This would be the only time when the centered-instance model needs input scaling < 1.
Best,
Elizabeth
Bug description
I am training the topdown model remotely on a cluster server, since I don't have a local GPU to properly train a sleap model on. My videos are pretty high in dimensionalilty (2160, 3840, 3), not allowing me to train in their original size. Which brings me to the
input_size
parameter: For the centroid model everything works fine. I can set the input scale to <1.0 and it runs without issues. Running the centered_instance model though, is when I get the following error (here withinput_scale = 0.4
):This is interesting, because the non-scaled input size, with
auto_crop
on, is 3840.Assumption 1
What seems to be happening is that
input_scaling
is applied twice, since 3840 0,4 = 1536 and 1536 0,4 = 614 (rounded).So I try to find the section where it is applied repetitively. Following the error message above I looked at the following code section in #sleap/nn/training.py, line 1315-1340:
, more specifically:
Assumption 2
This might be the section that applies the scaling factor repetitively. It also seems to be "less relevant", since I understand it to be a section handling visualization only. I adjust line 1318 in the following way:
As opposed to before, the training runs through now. But I get the following error next:
This is where I get confused. This suggest that the section before might've not only been handling visual issues? Or that the problem runs deeper? But the error above only happens after training, not in between epochs. At this point there, must've been images from both the training- and validation-set of size 1536 running through the model, unless I understand something wrong. Why did I not get errors then? Also: are the assumptions that I made above incorrect?
Scaling
Also at this point I am not sure if I am addressing my core issue. I want to scale the input data down. I do not want to crop it (which works with an input_scale of 1.0). The reason being that I need all of the information in the frames.
input_scaling
. I will try to further understand the error above and maybe adjust your code in a way to make it work for me, but I am not sure if I can. Can you help me with the issues that I described above?Labels()
-object. My problem is that I am a bit overwhelmed and not quite sure where to start, since the object looks like this:Meaning, there are quite a few attributes in this object, some containing frames, instances and bounding_boxes. I am just not sure which I'd have to scale and which I shouldn't touch. Do you have a function/ a simpler way for me to use to scale the dataset?