Open j99ca opened 11 months ago
Hello. I believe that the problem is related to the issue you linked, but this one: https://github.com/openvinotoolkit/anomalib/pull/1340 describes the main reason. If I understand correctly, you have the latest code, so the memory issue shouldn't be that apparent, but I believe that the increase happens in any case due to implementation loading training and validation dataloader at the same time
Upon further inspection, this happens slowly over the epochs, so I don't think it's due to above linked issue. I don't think it necessarily even happens during validation, could be at the start of epoch.
Describe the bug
I am working with the EfficientAD model and I have been training the model in AWS Sagemaker. I have noticed the GPU memory usage explodes during validation. I was wondering if this is related to this issue involving the mean and standard deviation calculations?
I am using the current version of anomalib (not the release version which does not have the above fix included). I have attached screen shots showing epoch progress (using Sagemaker metrics/regex) and the GPU memory usage.
I am using a custom dataloader instead which subclasses the Folder one, since my data needs special decoding, but aside from that it should be quite similar in terms of operations.
Dataset
Folder
Model
Other (please specify in the field below)
Steps to reproduce the behavior
Run EfficientAD model in the cloud.
OS information
OS information:
Expected behavior
That memory usage is stable from the start of training
Screenshots
Pip/GitHub
GitHub
What version/branch did you use?
commit 1f50c95
Configuration YAML
Logs
Code of Conduct