Closed Levaru closed 1 year ago
Thank you for your feedback!
What did you do when you run your second training? If it's okay for the first training, it should be fine for the rest. Do you change your dataset or modify any hyperparameter?
Kinda. The training process with labelstudio goes like this:
def fit(...):
function from class RotBBoxModelApi(LabelStudioMLBase)
is executed. fit(...)
function all of the finished tasks are downloaded. In this case a task consists of an image and the annotations/labels for this image. These images and annotations are then combined into a dataset (with optional augmentations) and then used to train the model with a small number of epochs.I turned of most of the augmentations like mosaic and multiscale (especially multiscale because I get an OutOfMemory error even when training normally, I only have one 8Gb graphics card). This didn't really help with the labelstudio training.
I also did make sure that the labels from labelstudio are being correctly parsed to the format that R-Yolov4 requires.
I wasn't sure I am right or not. I was guessing that the following description of the training process of LabelStudio means that it will have a child process running in the background; therefore causing the CUDA memory error. Did the author of LabelStudio suggest how much your graphics card should be?
These images and annotations are then combined into a dataset (with optional augmentations) and then used to train the model with a small number of epochs.
Hi! I'm trying to implement your project as a ML backend for label-studio and I'm having some trouble. Predicting labels works without any problems and even training will work the first time. But when I try to train a second time I'll get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.79 GiB total capacity; 2.62 GiB already allocated; 37.62 MiB free; 2.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This is my implentation of the ML backend:
This is basically just your code combined from
detect.py
andtrain.py
.The testing is performed with the trash dataset and a model that was also trained on it. I'm not really familiar with pytorch and don't know if I implemented it correctly for this kind of application. I guess that the out-of-memory error is caused by reloading the model without clearing some old variables first? I have no idea which though.
Could you please take a look at it if you have the time? Maybe I'm just loading the model the wrong way.