Closed StrongMonkey closed 3 months ago
@iwilltry42 Also, we need to have this PR merged first https://github.com/iwilltry42/langchaingo/pull/1. Not sure what you are planning on merging that on upstream
@StrongMonkey once I'm back, I'll create another branch and change bases, so I can have my upstream PR and both of our changes in another branch 👍
Or faster - let's use your fork for now 🤔
Ok... once you approve, I will change the go.mod to point to my branch(for at least now)
We should be running more gorountine to parse and create embedding for documents to speed up ingestion time. Mostly of operation are not heavily relied on CPU. Specially when creating embedding with documents, we don't need to constraint with the number of core system has because we are just waiting for OPEN AI api calls and are not spending cpu/io resource locally.
This also relies on https://github.com/iwilltry42/langchaingo/pull/1.
Tested with new changes and it speed up ingestion time from 400 seconds down to 38 seconds for a 3000 page PDF.
Will re-run e2e tests but this should not impact our ingestion quality.