Open JustChenk opened 2 months ago
Hey @JustChenk Why can the video memory occupy about 1G when using a model of tens of MB? The memory usage includes not just the model parameters but also other operations which are expanded and stored in higher precision on the GPU.
Is it possible to reduce the amount of video memory used to load the YOLO model? Yes, by using techniques like model quantization and lower precision data types (e.g., float16).
Does the model itself not contain gradient and optimizer parameters? Correct, the model file typically contains only the parameters (weights and biases), while gradients and optimizer states are computed and stored during training.
What is the relationship between the memory usage and the size of the YOLO model? The on-disk size reflects the compressed or low-precision parameters, while the in-memory size includes the expanded parameters, activations, and additional overhead.
Can video memory usage be reduced in some way, although performance may be affected? Yes, using mixed precision training, model pruning, reducing batch size, or using memory-efficient architectures can reduce memory usage but may impact performance.
Suggestion for Exporting the Model To further reduce memory usage and optimize performance, consider exporting the model to TensorRT, which can perform optimizations like precision calibration and kernel fusion, leading to reduced memory footprint and faster inference. Here is the documentation.
Search before asking
Question
Additional
The following is the code for the invocation model.
Is it the same amount of video memory that other people use when using the model?