zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
53.68k stars 7.21k forks source link

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

Open mshakirDr opened 1 month ago

mshakirDr commented 1 month ago

Question

I have been trying to ingest about 1000 PDFs through PGPT. After testing I found that pipeline with 1 worker is the fastest option on my system (any more workers hinder the speed). However, I found that the 8 GB VRAM and 32 GB (out of 64 GB) shared memory of my system quickly gets occupied even if I try to ingest 10 PDFs at a time. I tried to circumvent the memory hogging issue by restarting the pipeline every time. See below how I build a chunking solution by using LocalIngestWorker from ingest_folder.py.

    files = get_list_of_combined_files(folders)
    print(len(files))
    split_into_chunks = lambda lst, n: [lst[i:i+n] for i in range(0, len(lst), n)]
    list_of_size_30_chunks = split_into_chunks(files, 10)
    for index, chunk in enumerate(list_of_size_30_chunks):
        print("Chunk number", index, "of", len(list_of_size_30_chunks))
        destination = r"\Temp\\"
        copy_new_files(destination, chunk)
        ingest_service = global_injector.get(IngestService)
        settings = global_injector.get(Settings)
        worker = LocalIngestWorker(ingest_service, settings)
        worker.ingest_folder(Path(destination), irgnored)
        del worker
        del ingest_service
        del settings

However this does not release the memory at the end of for loop and the same problem persists (I even tried del with no luck). I tried to search around about potential memory leak issues with huggingface text embeddings solution: found this memory leak issue. Is it just me or anyone else also facing the same issue with ingest mode pipeline, huggingface on an nvidia gpu? I would appreciate any solution or suggestions.

mshakirDr commented 1 month ago

I have found a work around by ingesting 5 pdfs at one time, then clear torch cuda cache, and restart the process again (pipeline mode, mock profile, huggingface embedding model). It is slow, but it works. The memory is reset after every batch. It takes time to write the results to database, the GPU is idle in the meantime but it is the most efficient way I could find based on my hardware. Added the following at the end of my code adapted from ingest_folder.py.

    del worker
    del settings
    del ingest_service
    with torch.no_grad():
        torch.cuda.empty_cache()
        gc.collect()