tarasglek / chatcraft.org

Developer-oriented ChatGPT clone
https://chatcraft.org/
MIT License
154 stars 33 forks source link

Ability to attach file(s) to a chat #325

Open humphd opened 9 months ago

humphd commented 9 months ago

Similar to the work that's happening in #286, where we can attach image files, I'd like to be able to attach arbitrary file types and be able to then include them in the context of the chat. Eventually, this could include:

We don't need to worry about all of these at first, and can file follow-up issues to do them once the basics are included.

To begin, we should probably add a paperclip icon to the prompt area, which when clicked, allows you to select local a file(s). We'd then show the file as an icon in the chat messages, but behind the scenes extract the text to include as part of the message's content.

We should consider how to handle the database, we could add a separate table for storing files as blobs, or perhaps we decide to only store the content in the message? I suspect we'll want the files, but I'm not sure.

rjwignar commented 9 months ago

To begin, we should probably add a paperclip icon to the prompt area, which when clicked, allows you to select local a file(s). We'd then show the file as an icon in the chat messages, but behind the scenes extract the text to include as part of the message's content.

I'd like to work on this feature.

kosty commented 9 months ago

adding my 2 cents to get on watchers list to this feature.

humphd commented 9 months ago

@rjwignar that would be awesome. @mingming-ma has already done a bunch of the ground work for this in https://github.com/tarasglek/chatcraft.org/pull/286, so I think we should:

  1. help @mingming-ma get #286 reviewed and merged ASAP (that feature is too good to leave on the table)
  2. adapt the "attach image" feature it gives us to allow for arbitrary files to be included
  3. decide on which document types we want to include beyond text (e.g., PDF, Word, ...?) and file separate issues to figure out how to handle those in the browser or on cloudflare
tarasglek commented 9 months ago

Longer term, we should unify this with /import feature and allow to do transforms. Eg a simple transform that we should support in /import is readability

For example: I am currently enjoying summarizing youtubes with my summary prompt, but would be nicer to have ability to /import youtube/url and have that strip stuff down to subtitles.

Another example: when we load a pdf, would be nice to parse that pdf to text using mathpix

humphd commented 9 months ago

@rjwignar you could actually tackle this from the bottom-up, as @tarasglek suggests, if you want to get started.

We have "slash commands" defined in ChatCraft. See:

The /import command he's referring to runs in two parts: first we proxy the URL to import through a server-side function on CloudFlare:

https://github.com/tarasglek/chatcraft.org/blob/main/src/lib/commands/ImportCommand.ts#L51

This lets us overcome CORS issues, but could also provide a way to do some processing if necessary. For example, we currently transform GitHub URLs to get the raw text:

https://github.com/tarasglek/chatcraft.org/blob/34fbb43a595a5adcce9e85e84a3b5dea4d898222/functions/api/proxy.ts#L7

Next, we decide if the format of the content is something we can ingest, and if so, we get it as raw text.

You could add a Readability parsing step here, do PDF parsing, etc.

This same functionality could then be hooked into the front-end, visually.

tarasglek commented 9 months ago

would be cool to make our serverside scraper use https://www.npmjs.com/package/youtube-captions-scraper for youtube links

humphd commented 9 months ago

would be cool to make our serverside scraper use https://www.npmjs.com/package/youtube-captions-scraper for youtube links

filed #360

humphd commented 7 months ago

https://tools.simonwillison.net/ocr is a great example of text extraction from attachments in browser, some of which we could copy.

rjwignar commented 6 months ago

Unassigning myself for now but this is something I'd like to try working on shortly after the semester.