umyelab / LabGym

Quantify user-defined behaviors.
GNU General Public License v3.0
64 stars 5 forks source link

"train categorizer" killed #83

Closed vzimmern closed 7 months ago

vzimmern commented 8 months ago

Good evening,

Thanks again for excellent code. I now have multiple pairs of behaviors for all my behaviors of interest. I generated prepared behaviors without any issues and then went to the next step of training a categorizer.

Here's the output code on terminal:

Training Categorizer with both Animation Analyzer and Pattern Recognizer using the behavior examples in: /home/minassian/Documents/LabGym/EPM1-Full/EPM1-S1153-11-8-23/EPM1-S1153-11-8-23-prepared-behavior Found behavior names: ['grooming' 'locomotion' 'myoclonus' 'orientation' 'still'] Perform augmentation for the behavior examples... This might take hours or days, depending on the capacity of your computer. 2023-12-13 18:48:36.010182 Start to augment training examples... The augmented example amount: 10000 2023-12-13 18:49:11.544536 The augmented example amount: 20000 2023-12-13 18:49:47.204302 The augmented example amount: 30000 2023-12-13 18:50:22.845424 The augmented example amount: 40000 2023-12-13 18:50:58.387763 The augmented example amount: 50000 2023-12-13 18:51:34.150707 The augmented example amount: 60000 2023-12-13 18:52:09.800629 The augmented example amount: 70000 2023-12-13 18:52:45.624165 The augmented example amount: 80000 2023-12-13 18:53:21.330311 Start to augment validation examples... Killed

I have NVIDIA GeForce RTX 3090 GPU with CUDA version 12.3 installed for Ubuntu 22.04 LTS.

Any thoughts on what could have caused the process to be killed? Thanks for your help!

yujiahu415 commented 8 months ago

Hi,

This issue was simply because your computer run out of memory (RAM). The ideal solution is to have more memory, for example, increase the RAM / close other programs that consume a lot of RAM, or mount virtual memory on a hard drive that is not system drive and has a lot of free space.

Alternatively, you can reduce the amount of training examples, or use fewer augmentation methods---these can make the total amount of training data fewer. You can also reduce the input shape of the Categorizer.

By the way, I saw you had CUDA 12.3 installed. I suggest you to downgrade the CUDA version to 11.7 because the two python deep learning libraries, PyTorch and Tensorflow, may not be compatible with the version 12.3.