mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
13.79k stars 887 forks source link

Getting it to browse webpage(s) and summarize them #602

Closed TomLucidor closed 1 month ago

TomLucidor commented 1 month ago

Sorry for asking this, but it seems that this browser chatbot cannot read from (individual or lists of) webpages, which kind of makes this not as useful as expected. If this software could support webcams, could it also support caching small amounts of webpages (e.g. from a blog)? https://github.com/mlc-ai/web-llm/issues/291

flatsiedatsie commented 1 month ago

For security reasons, web-browsers aren't allowed to load in other web pages.

Generally what people do is to implement a server-side script that will grab the other web page for any javascript that requests it. For example, javascript could ask a PHP script for a url, and then PHP will download that webpage and return it to javascript for further client-side processing.

Webcam's are different, as browsers offer javascript a built-in getUserMedia API to start the camera and get frames from it.

WebLLM offers you a lot of features, and you could easily implement web-page loading in your own project using the pipeline mentioned above.

Here's an example PHP script you could use:

<?php

# example of how you would call this script:
# webpage_downloader.php?url=https%3A%2F%2Fwww.example.com

$url_to_get = filter_var($_GET["url"], FILTER_SANITIZE_URL);

$webpage = file_get_contents( $url_to_get );

# or more complex:
#$opts = array(
#  'http'=>array(
#    'method'=>"GET",
#    'header'=>"Accept-language: en\r\n"
#  )
#);
#$context = stream_context_create($opts);
#$webpage = file_get_contents('http://www.example.com/', false, $context);

# You can then encode the result for transportation back to javascript however you prefer:

#echo '{"webpage":"' . addslashes($webpage) . '"}';
#echo '{"webpage":"' . htmlspecialchars($webpage) . '"}';
echo '{"url":"' . $url_to_get . '", "content":' . json_encode($webpage) . '}';

?>
TomLucidor commented 1 month ago

@flatsiedatsie but what about web crawling or web caching, are there ways to get it working with Web-LLM?

flatsiedatsie commented 1 month ago

See my previous answer.

TomLucidor commented 1 month ago

@flatsiedatsie to clarify, do you have any recommendation that uses Python or other tools rather than PHP? So that it can be hooked back into Web-LLM as a static file?

flatsiedatsie commented 1 month ago

Ah, Python. No not really. But, if you're using it browser-based, it should be pretty similar to the PHP example. You could ask a WebLLM AI to turn the PHP code into Python code ;-)

TomLucidor commented 1 month ago

emotional-damage