umyelab / LabGym

Quantify user-defined behaviors.
GNU General Public License v3.0
70 stars 8 forks source link

Training stopping early #209

Closed meghanflanigan closed 1 month ago

meghanflanigan commented 3 months ago

Hi,

We are having issues with our categorizer training stopping after only ~10-13 iterations. We are seeing warnings like:

"Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 108). These functions will not be directly callable after loading."

We see these after some of the early iterations, but not all. We were previously able to train a categorizer on this dataset with good results--however, we now want to improve it with the addition of extra behavioral examples.

Any idea what could be going on here?

image
yujiahu415 commented 3 months ago

Hi,

The training seems not good because the val_loss is pretty high and the Categorizer struggles with some categories like left/right turn. Would it be possible to share with me your sorted behavior dataset so that I can take a look and provide specific suggestions? If that is possible, you can share a dropbox/google drive folder with me: henryhu@umich.edu. And also, please let me your settings for training the Categorizer, like the Categorizer type, complexity level, and input size, as well as which augmentation methods you used.

meghanflanigan commented 3 months ago

Thanks for your quick response. I definitely see that those values are not great--I'm just wondering why the training would be stopping after so few iterations considering the val_loss is still so high? When we used this same dataset before (minus the few extra examples we added later), it trained through thousands of iterations and we were able to get F1 values for all behaviors >0.90.

We can definitely share the dataset with you along with all of that info. Thanks for taking a look for us!

yujiahu415 commented 2 months ago

Hi,

I pasted my suggestions here so they may be helpful to other users. Thank you for allowing me to do this!

First, I think LabGym would be a perfect fit for this type behavior analysis. But you need to re-generate behavior examples, and re-sort them. Here are my suggestions:

  1. When generating behavior examples, unless including background in animations helps to identify the behaviors, you can always generate examples without including background. For this type of behavioral analysis, the positions/motion of ears may be useful to distinguish grooming/rearing, so you can include body parts in pattern images with ’STD’ number ==50.

  2. If you don't need to distinguish left turn VS right turn, you can combine left turn with right turn into one category—turn.

  3. When you sort the behavior examples, make sure:

a. Examples in different categories look different, for example, I saw many examples in 'left/right turn' looked like this: walk sorted as turn

They look identical to examples in ‘walk’ the above, like: walk

This would make the Categorizer very confused during training.

So I suggest you move all examples in 'left/right turn' that actually contain walking (or at least look identical) to ‘walk’, and make the ’turn’ as turning in place without long distance walking, like this: turn

Similarly, the examples in ‘inactive’ are largely identical to those in ‘freeze’. If you don’t want to combine these two categories into one, you can sort all examples that are completely immobile as ‘freeze’, like: 56-6_mouse_0_8835_len15

while all small movements as ‘inactive’, like: inactive

b. Avoid repeats in examples, which means, for examples that look almost the same, like these three:

repeats

You only need to keep one. Too many repeats would make the training less efficient. Instead, you can include different behavior variants or scenarios in one category to increase the diversity of the dataset.

You don’t need many examples, 200~300 pairs (one pair contains one animation and its pair pattern image), well-selected examples would already be sufficient for training a Categorizer with good accuracy.

For Categorizer settings, you can try Animation Analyzer level 3-4, input size 32 or 64, Pattern Recognizer level 3-4, input size 64. For augmentation methods, you can choose all of them, and always augment the validation dataset as well.

A well-selected and well-sorted dataset is the most important thing in using LabGym. This may take some effort trial and error at the beginning but will eventually save you tons of time in later steps.

Thanks again!