GPU memory leak. - Githubissues

nivibilla commented 1 year ago

I am doing batch inference over a very large dataset. And I see that slowly over time I become OOM even though I am deleting all variables assigned. Here is the code

import gc

for batch_number in tqdm(batch_numbers):
  ids = []
  inputs = []
  for x in sampled_input_batched.filter(col('batch') == batch_number).collect():
    ids.append(x[0])
    inputs.append(x[1])

  output = instructor_model.encode(inputs, batch_size=BATCH_SIZE)

  temp_df = spark.createDataFrame(pd.DataFrame({
  'reviewid' : ids,
  'instructor_embeddings' : output
  }))

  temp_df.write.parquet(f"{inference_folder}/temp_inference_output_batch_{batch_number}.parquet")

  with torch.no_grad():
    del review_ids
    del review_inputs
    del output
    del temp_df
    torch.cuda.empty_cache()
    gc.collect()

But after each iteration of the loop. there is some residual memory being retained by the gpu. The model takes 5685mb of memory. But after each loop this number increases slightly. So after enough loops I run OOM. Could you tell me where the memory leak may be?

hongjin-su commented 1 year ago

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

You may try to turn on the evaluation mode of the model in inference. You may also include with torch.no_grad(): to avoid gradients in the calculation.

Feel free to add any further questions or comments!

nivibilla commented 1 year ago

I see. Let me try it out and update you on the results

arjunram07 commented 1 year ago

Same issue, did you solve this @nivibilla ?

nivibilla commented 1 year ago

No, but I think it's an issue with jupyter notebook and pytorch. Not this repo. Not sure.

hongjin-su commented 9 months ago

Feel free to re-open this issue if you have any further questions or comments!

xlang-ai / instructor-embedding

GPU memory leak. #37