Open felixhaeberle opened 1 year ago
I would recommend using the new headless chrome with better fingerprint regarding bot detection.
@NiklasBuchfink this is fine, thank you!! will probably help us a lot. 🎉
but it's not needed in this place. because meta data which we want to scrape here is only a fetch
away, this data (the dom which is getting returned by the fetch) can then be easily parsed with cheerio. Also images are no problem with this approach. I wrote a component before that does this and will later provide it here when my assistant is back online.
True, puppeteer screenshots are just needed for js applications that don't ship the right HTML + CSS on requests.
for example:
import React, { useState } from "react";
import wretch from "wretch";
import getDominantColor from "get-image-colors";
interface Metadata {
title: string;
description: string;
favicon: string;
ogImage: string;
mainColor: string;
}
const initialMetadata: Metadata = {
title: "",
description: "",
favicon: "",
ogImage: "",
mainColor: "",
};
const getUrlMetadata = async (url: string): Promise<Metadata> => {
const response = await wretch(url).get().text();
const parser = new DOMParser();
const html = parser.parseFromString(response, "text/html");
const title = html.querySelector("title")?.textContent || "";
const description =
html.querySelector('meta[name="description"]')?.getAttribute("content") ||
"";
const favicon =
html
.querySelector('link[rel="shortcut icon"], link[rel="icon"]')
?.getAttribute("href") || "";
const ogImage =
html.querySelector('meta[property="og:image"]')?.getAttribute("content") ||
"";
const ogImageElement =
html.querySelector('meta[property="og:image"]') ||
html.querySelector("img[srcset][sizes]");
let ogImageDataURL = "";
if (ogImageElement) {
const imageResponse = await wretch(
ogImageElement.getAttribute("srcset")?.split(" ")[0] ||
ogImageElement.getAttribute("src") ||
""
)
.get()
.blob();
ogImageDataURL = await new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => {
resolve(reader.result as string);
};
reader.readAsDataURL(imageResponse);
});
}
const themeColor = html
.querySelector('meta[name="theme-color"]')
?.getAttribute("content");
const mainColor = themeColor ? themeColor : "";
if (ogImageElement) {
const image = await fetch(
ogImageElement.getAttribute("src") ||
ogImageElement.getAttribute("srcset")?.split(" ")[0] ||
""
);
const blob = await image.blob();
const colors = await getDominantColor(blob);
const color = colors[0].hex();
if (color && !mainColor) {
mainColor = color;
}
}
return {
title,
description,
favicon,
ogImage: ogImageDataURL,
mainColor,
};
};
const LinkPreview: React.FC = () => {
const [metadata, setMetadata] = useState<Metadata>(initialMetadata);
const handlePaste = async (event: React.ClipboardEvent<HTMLDivElement>) => {
const pasteData = event.clipboardData?.getData("text");
if (pasteData && pasteData.match(/^https?:\/\//)) {
const newMetadata = await getUrlMetadata(pasteData);
setMetadata(newMetadata);
}
};
return (
<div className="bg-white p-4 rounded-md" onPaste={handlePaste}>
<div className="w-full h-full"></div>
{metadata.title && (
<div className="mt-4">
<h2 className="text-lg font-bold">{metadata.title}</h2>
{metadata.description && (
<p className="text-gray-500">{metadata.description}</p>
)}
{metadata.favicon && (
<img
className="w-8 h-8 mt-2 mr-2 inline-block"
src={metadata.favicon}
alt=""
/>
)}
{metadata.ogImage && (
<img className="max-w-full mt-2" src={metadata.ogImage} alt="" />
)}
</div>
)}
</div>
);
};
export default LinkPreview;
The aim of this issue is to allow for pasting links in the block editor and include a screenshot using Puppeteer