[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode

Question

I have been trying to ingest about 1000 PDFs through PGPT. After testing I found that pipeline with 1 worker is the fastest option on my system (any more workers hinder the speed). However, I found that the 8 GB VRAM and 32 GB (out of 64 GB) shared memory of my system quickly gets occupied even if I try to ingest 10 PDFs at a time. I tried to circumvent the memory hogging issue by restarting the pipeline every time. See below how I build a chunking solution by using LocalIngestWorker from ingest_folder.py.

    files = get_list_of_combined_files(folders)
    print(len(files))
    split_into_chunks = lambda lst, n: [lst[i:i+n] for i in range(0, len(lst), n)]
    list_of_size_30_chunks = split_into_chunks(files, 10)
    for index, chunk in enumerate(list_of_size_30_chunks):
        print("Chunk number", index, "of", len(list_of_size_30_chunks))
        destination = r"\Temp\\"
        copy_new_files(destination, chunk)
        ingest_service = global_injector.get(IngestService)
        settings = global_injector.get(Settings)
        worker = LocalIngestWorker(ingest_service, settings)
        worker.ingest_folder(Path(destination), irgnored)
        del worker
        del ingest_service
        del settings

However this does not release the memory at the end of for loop and the same problem persists (I even tried del with no luck). I tried to search around about potential memory leak issues with huggingface text embeddings solution: found this memory leak issue. Is it just me or anyone else also facing the same issue with ingest mode pipeline, huggingface on an nvidia gpu? I would appreciate any solution or suggestions.

zylon-ai / private-gpt

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

Question