Open humphd opened 9 months ago
To begin, we should probably add a paperclip icon to the prompt area, which when clicked, allows you to select local a file(s). We'd then show the file as an icon in the chat messages, but behind the scenes extract the text to include as part of the message's content.
I'd like to work on this feature.
adding my 2 cents to get on watchers list to this feature.
@rjwignar that would be awesome. @mingming-ma has already done a bunch of the ground work for this in https://github.com/tarasglek/chatcraft.org/pull/286, so I think we should:
Longer term, we should unify this with /import feature and allow to do transforms. Eg a simple transform that we should support in /import is readability
For example: I am currently enjoying summarizing youtubes with my summary prompt, but would be nicer to have ability to /import youtube/url and have that strip stuff down to subtitles.
Another example: when we load a pdf, would be nice to parse that pdf to text using mathpix
@rjwignar you could actually tackle this from the bottom-up, as @tarasglek suggests, if you want to get started.
We have "slash commands" defined in ChatCraft. See:
The /import
command he's referring to runs in two parts: first we proxy the URL to import through a server-side function on CloudFlare:
https://github.com/tarasglek/chatcraft.org/blob/main/src/lib/commands/ImportCommand.ts#L51
This lets us overcome CORS issues, but could also provide a way to do some processing if necessary. For example, we currently transform GitHub URLs to get the raw text:
Next, we decide if the format of the content is something we can ingest, and if so, we get it as raw text.
You could add a Readability parsing step here, do PDF parsing, etc.
This same functionality could then be hooked into the front-end, visually.
would be cool to make our serverside scraper use https://www.npmjs.com/package/youtube-captions-scraper for youtube links
would be cool to make our serverside scraper use https://www.npmjs.com/package/youtube-captions-scraper for youtube links
filed #360
https://tools.simonwillison.net/ocr is a great example of text extraction from attachments in browser, some of which we could copy.
Unassigning myself for now but this is something I'd like to try working on shortly after the semester.
Similar to the work that's happening in #286, where we can attach image files, I'd like to be able to attach arbitrary file types and be able to then include them in the context of the chat. Eventually, this could include:
We don't need to worry about all of these at first, and can file follow-up issues to do them once the basics are included.
To begin, we should probably add a paperclip icon to the prompt area, which when clicked, allows you to select local a file(s). We'd then show the file as an icon in the chat messages, but behind the scenes extract the text to include as part of the message's content.
We should consider how to handle the database, we could add a separate table for storing files as
blob
s, or perhaps we decide to only store the content in the message? I suspect we'll want the files, but I'm not sure.