Open mshakirDr opened 1 month ago
I have found a work around by ingesting 5 pdfs at one time, then clear torch cuda cache, and restart the process again (pipeline
mode, mock
profile, huggingface
embedding model). It is slow, but it works. The memory is reset after every batch. It takes time to write the results to database, the GPU is idle in the meantime but it is the most efficient way I could find based on my hardware. Added the following at the end of my code adapted from ingest_folder.py
.
del worker
del settings
del ingest_service
with torch.no_grad():
torch.cuda.empty_cache()
gc.collect()
Question
I have been trying to ingest about 1000 PDFs through PGPT. After testing I found that
pipeline
with1
worker is the fastest option on my system (any more workers hinder the speed). However, I found that the 8 GB VRAM and 32 GB (out of 64 GB) shared memory of my system quickly gets occupied even if I try to ingest 10 PDFs at a time. I tried to circumvent the memory hogging issue by restarting the pipeline every time. See below how I build a chunking solution by usingLocalIngestWorker
fromingest_folder.py
.However this does not release the memory at the end of
for
loop and the same problem persists (I even trieddel
with no luck). I tried to search around about potential memory leak issues withhuggingface text embeddings
solution: found this memory leak issue. Is it just me or anyone else also facing the same issue with ingest modepipeline
,huggingface
on an nvidia gpu? I would appreciate any solution or suggestions.