Include remote urls in chat message enrichment context

justyns commented 6 months ago

We're including context for direct links to pages inside of SB now in #9 but it'd also be cool to automatically fetch remote urls and include those in context as well.

e.g. "Please look at https://ai.google.dev/tutorials/rest_quickstart and tell me how to use the Gemini API"

One problem is that we'd need to convert the remote url to markdown first. In discord, these libraries were mentioned:

I don't really want to include any libraries in this plugin if we don't have to, so I'd prefer the option of using a self-hostable api if possible. Another option could be to shell out to a command (curl+pandoc), but I'd want to leave this up to the user maybe by providing a space script function to use?

zefhemel commented 6 months ago

You may just ask the LLM: hey I'm going to throw a website's HTML at you, try to make sense of it 😀 may be quite large though

justyns commented 6 months ago

With https://github.com/justyns/silverbullet-ai/pull/26 , this is now possible:

const readabilityImport = import("https://esm.sh/@mozilla/readability");
let urlCache = {};

silverbullet.registerFunction('fetchAndProcessUrl', async (url) => {
  console.log("Received ", url);
  // Check if the URL is in the cache
  if (urlCache[url]) {
    console.log("Returning cached data for ", url);
    return urlCache[url];
  }

  const { Readability } = await readabilityImport;
  console.log(Readability);

  try {
    const response = await syscall("sandboxFetch.fetch", url);
    const body = atob(response.base64Body);
    // console.log("response: ", response);
    // console.log("response body: ", body);
    if (!response.ok) {
      return {
        success: false,
        message: `Failed to fetch URL: ${response.statusText}`,
      };
    }

    const doc = new DOMParser().parseFromString(body, 'text/html');
    const reader = new Readability(doc);
    const article = reader.parse();
    const textContent = article.textContent.replace(/\n\s*\n?/g, '\n').trim();
    console.log(article);

    const enrichData = `\n\nContent of the URL [${url}]:\n~~~\n${textContent}\n~~~\n`;

    // Store the result in the cache before returning
    const result = {
      success: true,
      title: article.title,
      content: article.content,
      textContent: textContent,
      excerpt: article.excerpt,
      enrichData: enrichData,
    };
    urlCache[url] = result;

    return result;
  } catch (error) {
    return {
      success: false,
      message: `Error processing URL: ${error.message}`,
    };
  }
});

silverbullet.registerFunction('enrichWithURL', async (message) => {
  const urlRegex = /(https?:\/\/[^\s]+)/g;
    let enrichedData = message;
    const urls = message.match(urlRegex);
    if (urls) {
        for (const url of urls) {
            const enrichedContent = await syscall('system.invokeSpaceFunction', 'fetchAndProcessUrl', url);
            enrichedData += enrichedContent.enrichData;
        }
    }
    return enrichedData;
});

silverbullet.registerEventListener({name: "ai:enrichMessage"}, async (event) => {
  return 'enrichWithURL';
});

I'm borrowing an idea from https://community.silverbullet.md/t/migrating-from-obsidian-tasks and importing the readability library directly from github. I did experiment with adding readability to this plug, but it brings the plug size from ~30kb to over 1mb.

somewhat long gif of the request in my original comment:

chat-url-context-custom-func

zefhemel commented 6 months ago

This is very cool 👍🏻

justyns / silverbullet-ai

Include remote urls in chat message enrichment context #14