Open eric-gardyn opened 2 months ago
hi eric, long time no talk 😄
currently, there is no support for that, though you could write some custom routes/endpoints that wrap the ingest logic in the mongodb-rag-ingest
package.
relatedly, we're moving some of that logic to the mongodb-rag-core
library soon - https://github.com/mongodb/chatbot/pull/455 (though you'll still be able to consume it from the mongodb-rag-ingest
lib)
FWIW, I now have the 'ingest' running as an endpoint on an Azure function app (serverless function). Just had to tweak the 'loadConfig' method in WithConfig.ts (I am running the repo's Typescript files for 'rag-ingest') to correctly load the config. Otherwise, it works; it even helped me find a "bug" in my config object ;)
Next step is a wrapper code that can take the source of the modified content (in my case, an external CMS) and accordingly call the server-less endpoint.
FWIW, I now have the 'ingest' running as an endpoint on an Azure function app (serverless function).
nice! just to clarify what you mean, did you created an endpoint that's like POST /ingest
to trigger the ingestion process?
did you make separate pages
/embed
endpoints? are there path parameters to do it by data source, ie POST /ingest/pages/:sourceName
?
somewhat related, i think it would be really neat to have embedding occur as an event-based process whenever a page is updated. would be pretty straightforward with MongoDB change streams. you'd just need to build some basic event queue to process the page creation/change/deletion events to take into account rate limit issues with the embedding models.
yes POST /ingest that takes an array of strings in body's argument. and basically just using withConfig like so:
const resp = await withConfig(doAllCommand, { doPagesCommand, config, sourceNames })
changed doAllCommand args to
type DoAllCommandArgs = {
doPagesCommand: typeof standardDoPagesCommand
sourceNames?: string[]
}
and updated doAllCommand to call
await doPagesCommand(config, { source: sourceNames })
await doEmbedCommand(config, {
since: lastSuccessfulRunDate ?? new Date('2023-01-01'),
source: sourceNames,
})
doPagesCommand and doEmbedCommand already took 'source' as string[]
nice. this is great feedback. i realistically don't think that we'll create an ingest API anytime soon since we don't have need on our end. however, i would like to cleanly expose the ingestion methods so you or others can do something like what you've done w/o having to do anything hacky. like a "MongoDB RAG Ingest SDK".
Hi,
Is there a way to use the Ingest package to be more "real-time", API driven? Use case: We have an FAQ which is updated quite often in a CMS. Goal would be to trigger an ingestion of the content on every Create/Update/Delete operation in the CMS.
Is it possible with some little effort?