nasa-petal / bidara-deep-chat

BIDARA is a GPT-4o chatbot that was instructed to help scientists and engineers understand, learn from, and emulate the strategies used by living things to create sustainable designs and technologies using the Biomimicry Institute's step-by-step design process.
https://bit.ly/bidara-ai
Other
23 stars 5 forks source link

File Citation for File Search (Retrieval) #160

Open marioseixas opened 5 months ago

marioseixas commented 5 months ago

i managed to do that in openai assistant playground by:

jackitaliano commented 5 months ago

Nice to haves

...

jackitaliano commented 5 months ago

File citations found during "thread.message.delta" event in event stream (with streaming).

Example:

{
  "id":"msg_rn9sYJnDGgP1CIliFFOjsnPm",
  "object":"thread.message.delta",
  "delta": {
    "content": [ 
      { 
        "index":0,
        "type":"text",
        "text": {
          "value":"【8:0†source】",
          "annotations": [
            {
              "index":0,
              "type":"file_citation",
              "text":"【8:0†source】",
              "start_index":536,
              "end_index":548,
              "file_citation": {
                  "file_id":"file-y6QXl1TdKON4MIoJKLlZ4cKf",
                  "quote":"<quote from file here>"
              }
            }
          ]
        }
      }
    ]
  }
}

Currently, bidara-deep-chat is not using steaming, so this will not be the same. However, streaming is likely to be implemented soon (see #73), so it might be beneficial to plan on implementing for that rather than having to update it again afterwards.

Assuming streaming is implemented, these object can be accessed via: AssistantDeepChat.svelte

<script>
...
async function responseInterceptor(response) {
  if (response.object === "thread.message.delta") {
    const newContent = response.delta.content.map((content) => {
      const newContents = content.map((msg) => {
        if (msg.type !== "text") {
          return msg;
        }

        msg.annotations.forEach((annotation) => {
          if (annotation.type === "file_citation") {
            const quote = `Quote:\n"${annotation.file_citation.quote}"`;
            msg.text.replace(annotation.text, quote);
          }
        });

        return msg;
      });

      return newContents;
    });

    response.delta.content = newContent;
  }

  return response;
}
...
</script>

This would also need to be implemented in thread loading via: threadUtils.js

async function convertThreadMessagesToMessages(threadId, threadMessages) { ... }

In both cases, likely best to add something like a handleCitations(...) function.

Not entirely sure what the best form of replacement is for these citations. Could do something like the inline quote, cite them as they are with quotes at the bottom, proper citations (MLA, APA, etc. like you mentioned) inline or at the bottom, or something else entirely if you had any ideas.

marioseixas commented 5 months ago

enclose within a bibtex citation syntax ‘\cite{filename.pdf}’ or ‘\cite{filename}’ at the end of every sentence with a ‘source’ mark and at the end a bibtex entry with the document data keys and the quote as bibtex comment within