kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
302 stars 126 forks source link

Consider adding support for calling a local (or remote) LLM API to manipulate article contents #1239

Open Jaifroid opened 6 months ago

Jaifroid commented 6 months ago

This is highly speculative in terms of usefulness, and the UI would need to be considered carefully. Use case would be for summarizing articles retrieved from the ZIM. Over time, it might be possible to allow a local LLM to use the app as a research tool, hence providing a natural-language interface to informational ZIM contents.

It would be relatively easy to make the calls in JS. We would just need to use the Hugging Face Agents library. Something like (for the Antrhopic API):

npm install @huggingface/agents

import { AnthropicAgent } from "@huggingface/agents";
const ANTHROPIC_API_KEY = "YOUR_API_KEY";
const agent = new AnthropicAgent(ANTHROPIC_API_KEY);
const prompt = "Summarize the following article in no more than 500 words: \n\n" + articleIframe.textContent;
const generatedCode = await agent.generateCode(prompt);
const evaluationResult = await agent.evaluateCode(generatedCode);
const messages = evaluationResult.messages;

(N.B. Untested)

To use this offline, the user would need to run a local LLM using kobold.cpp or possibly Mozilla's llamafile, and set up an API key, which they would need to provide to the Kiwix app. Ergo, only a solution for enthusiasts and tinkerers.

To provide the LLM in-app would require running a WASM inferencer such as https://github.com/mlc-ai/web-llm. But to support a model with large-enough context to ingest a Wikipedia article pulled from the ZIM would likely need a PC with a graphics card (and would only work on Chrome) or an Apple M1 Pro.

Would this be useful, or would it just be bloat?

Jaifroid commented 5 months ago

Someone also posted this: https://gist.github.com/hyrumsdolan/2aa3338f3005e9b468ff350c8f5929d9 (this one is specifically for using Claude AI in JS).

D3V-D commented 5 months ago

I don't know if you've seen the recent release of the new Llama3 models, but they're much better than the previous models, and beat models many times their size; the 8B one in particular performs very efficiently.

If we made some way to perhaps spawn in the model remotely and connect to that (maybe via some other external application that lets you do that, I know there are some interesting projects), then we can let them use it to for example summarize web pages. This would, as you said, require a pretty powerful machine though, so it would be more for users who have the luxury.

We can also perhaps allow for using API keys to other types of models that have such a thing - chatgpt comes to mind.

As for the UI, shouldn't be too difficult to make something out-of-the way, that doesn't interfere unless interacted with, like a floating chat button or a button in the navbar.

Jaifroid commented 5 months ago

Yes, but at this stage I think any work on this wouldn't be for merging, it would merely be proof of concept, because there has to be agreement within the org as to which direction they want to go in with AI integration (if at all). Personally, I think that for JS apps, the best integration would be with https://webllm.mlc.ai/, which is a WASM inference that works very well with a Llama 3 8b Instruct Q4 model that, as you say, is really impressive for its size (about 4GB). In the context of full English Wikipedia, 4GB isn't too bad to gain natural-language search capability.

I envisage one use being that the AI could be instructed to come up with search terms for a vague user query that would link to articles in the ZIM. Here is an experiment I did a few days ago (reverse search engine - screenshot below). The idea is that the terms in square brackets would be links to the relevant Wikipedia article for more details. Problem is ensuring the search terms it comes up with are actually in the ZIM!

image

D3V-D commented 5 months ago

Yeah, makes sense.

Also, that's a great demo; I think search could be a useful feature, but hallucination would be an issue - we would prob need to do something like run its response through some other code that then filters out non existent pages.