Instead of uploading a document, the user should be able to enter a URL.
Archyve needs to:
find and scrape the main document content
ideally show the user a preview of the text scraped from the page before actually ingesting the content
once the user confirms, chunk, embed, and graph the content just like it was a document
Along the way, this will require:
updating Document so that it works for a web source instead of just an uploaded attachment (preferred to creating a new model and having Collection#documents be polymorphic)
updating the Collection view so it shows web sources along with Documents
Since the web page has now been scraped, an enhancement now could be for the chat augmentation to include the URL(and maybe date scraped) that a chunk was derived from. Does that make sense?
Instead of uploading a document, the user should be able to enter a URL.
Archyve needs to:
Along the way, this will require:
Collection#documents
be polymorphic)