su77ungr / CASALIOY

♾️ toolkit for air-gapped LLMs on consumer-grade hardware
Apache License 2.0
230 stars 31 forks source link

Parallelize ingestion #85

Closed hippalectryon-0 closed 1 year ago

hippalectryon-0 commented 1 year ago

Inspired by https://github.com/imartinez/privateGPT/pull/255 (but better :P)

Fully parallelize the ingestion. The only non-parallelized part is writing to the db. Very significantly speed improvements going from 1 to 2 threads.

On my big library (2300 documents, several Gb on disk): 1 thread => ETA=20min 2 threads => ETA=12min (while one thread is writing to disk the other can compute. The limiting factor shifts towards writing to the db.) 3 threads => ETA=8min 4 threads => ETA=8min (can't do more, it's with CUDA and I don't have a lot of vram)

Major:

Minor: