mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
13.31k stars 851 forks source link

Getting it to browse webpage(s) and summarize them #602

Open TomLucidor opened 1 week ago

TomLucidor commented 1 week ago

Sorry for asking this, but it seems that this browser chatbot cannot read from (individual or lists of) webpages, which kind of makes this not as useful as expected. If this software could support webcams, could it also support caching small amounts of webpages (e.g. from a blog)? https://github.com/mlc-ai/web-llm/issues/291

flatsiedatsie commented 6 days ago

For security reasons, web-browsers aren't allowed to load in other web pages.

Generally what people do is to implement a server-side script that will grab the other web page for any javascript that requests it. For example, javascript could ask a PHP script for a url, and then PHP will download that webpage and return it to javascript for further client-side processing.

Webcam's are different, as browsers offer javascript a built-in getUserMedia API to start the camera and get frames from it.

WebLLM offers you a lot of features, and you could easily implement web-page loading in your own project using the pipeline mentioned above.

Here's an example PHP script you could use:

<?php

# example of how you would call this script:
# webpage_downloader.php?url=https%3A%2F%2Fwww.example.com

$url_to_get = filter_var($_GET["url"], FILTER_SANITIZE_URL);

$webpage = file_get_contents( $url_to_get );

# or more complex:
#$opts = array(
#  'http'=>array(
#    'method'=>"GET",
#    'header'=>"Accept-language: en\r\n"
#  )
#);
#$context = stream_context_create($opts);
#$webpage = file_get_contents('http://www.example.com/', false, $context);

# You can then encode the result for transportation back to javascript however you prefer:

#echo '{"webpage":"' . addslashes($webpage) . '"}';
#echo '{"webpage":"' . htmlspecialchars($webpage) . '"}';
echo '{"url":"' . $url_to_get . '", "content":' . json_encode($webpage) . '}';

?>