GPU Memory Overflow Issues During Inference with Large sized input Data.

I am encountering memory overflow issues when processing large data batches, specifically with a inference module of digital fingerprinting pipeline while generating predictions involving using GPU for data approximately 10 million rows by 10 columns. The problem arises during inference on GPUs after the model is loaded. When invoking the get_results() method of the autoencoder class, PyTorch appears to load both the model graph along with input data tensors simultaneously onto the GPU here. Consequently, when the input data exceeds around 7 million instances, it consumes 77 GB of the available 81 GB of GPU memory, leading to errors. A potential solution could be to implement a batch size setting within the digital fingerprinting (DFP) inference pipeline or the autoencoder's predict functions. This setting would enable the system to process predictions in batches, regardless of the total number of rows passed to the predict function, thereby preventing memory overflow errors.

nv-morpheus / Morpheus

GPU Memory Overflow Issues During Inference with Large sized input Data. #1663