Open justyns opened 6 months ago
You may just ask the LLM: hey I'm going to throw a website's HTML at you, try to make sense of it 😀 may be quite large though
With https://github.com/justyns/silverbullet-ai/pull/26 , this is now possible:
const readabilityImport = import("https://esm.sh/@mozilla/readability");
let urlCache = {};
silverbullet.registerFunction('fetchAndProcessUrl', async (url) => {
console.log("Received ", url);
// Check if the URL is in the cache
if (urlCache[url]) {
console.log("Returning cached data for ", url);
return urlCache[url];
}
const { Readability } = await readabilityImport;
console.log(Readability);
try {
const response = await syscall("sandboxFetch.fetch", url);
const body = atob(response.base64Body);
// console.log("response: ", response);
// console.log("response body: ", body);
if (!response.ok) {
return {
success: false,
message: `Failed to fetch URL: ${response.statusText}`,
};
}
const doc = new DOMParser().parseFromString(body, 'text/html');
const reader = new Readability(doc);
const article = reader.parse();
const textContent = article.textContent.replace(/\n\s*\n?/g, '\n').trim();
console.log(article);
const enrichData = `\n\nContent of the URL [${url}]:\n~~~\n${textContent}\n~~~\n`;
// Store the result in the cache before returning
const result = {
success: true,
title: article.title,
content: article.content,
textContent: textContent,
excerpt: article.excerpt,
enrichData: enrichData,
};
urlCache[url] = result;
return result;
} catch (error) {
return {
success: false,
message: `Error processing URL: ${error.message}`,
};
}
});
silverbullet.registerFunction('enrichWithURL', async (message) => {
const urlRegex = /(https?:\/\/[^\s]+)/g;
let enrichedData = message;
const urls = message.match(urlRegex);
if (urls) {
for (const url of urls) {
const enrichedContent = await syscall('system.invokeSpaceFunction', 'fetchAndProcessUrl', url);
enrichedData += enrichedContent.enrichData;
}
}
return enrichedData;
});
silverbullet.registerEventListener({name: "ai:enrichMessage"}, async (event) => {
return 'enrichWithURL';
});
I'm borrowing an idea from https://community.silverbullet.md/t/migrating-from-obsidian-tasks and importing the readability library directly from github. I did experiment with adding readability to this plug, but it brings the plug size from ~30kb to over 1mb.
somewhat long gif of the request in my original comment:
This is very cool 👍🏻
We're including context for direct links to pages inside of SB now in #9 but it'd also be cool to automatically fetch remote urls and include those in context as well.
e.g. "Please look at https://ai.google.dev/tutorials/rest_quickstart and tell me how to use the Gemini API"
One problem is that we'd need to convert the remote url to markdown first. In discord, these libraries were mentioned:
I don't really want to include any libraries in this plugin if we don't have to, so I'd prefer the option of using a self-hostable api if possible. Another option could be to shell out to a command (curl+pandoc), but I'd want to leave this up to the user maybe by providing a space script function to use?