I've worked up an enhancement to increase the speed of the final URL. The final URL is already hard coded into the news.google.com link. It appears its some sort of base64 decoding.
function decodeBase64FromUrl(url) {
const base64Pattern = /articles\/([A-Za-z0-9+_\-\/=]+)/;
const match = url.match(base64Pattern);
if (match && match[1]) {
const base64EncodedUrl = match[1].replace(/-/g, "+").replace(/_/g, "/");
try {
let decodedUrl = Buffer.from(base64EncodedUrl, "base64").toString(
"ascii",
);
const parts = decodedUrl.split(/[^\x20-\x7E]+/).filter(Boolean);
const urlPattern = /(https?:\/\/[^\s]+)/;
for (let part of parts) {
const urlMatch = part.match(urlPattern);
if (urlMatch && urlMatch[1]) {
// Return the object containing both urlID and finalURL
return {
urlID: match[1],
finalURL: urlMatch[1].replace(/R$/, ""),
};
}
}
console.error("No valid URL found in the decoded string:", decodedUrl);
} catch (error) {
console.error(
"Error decoding Base64 string:",
base64EncodedUrl,
"Original URL:",
url,
"Error:",
error.message,
);
}
} else {
console.error("No Base64 segment found in the URL. Original URL:", url);
}
// Return null or an empty object if no data could be extracted
return null; // or return {};
}
The part of the URL that has the final URL encoded is here:
however this has some non-ASCII characters that dont follow normal base64 encoded measures, so I made some modifications to only allow http/https in the front, remove the trailing "R" that always appears and any non-ASCII characters at the end. and to only keep the first occurrence I noticed sometimes it would repeat due to possible non-ASCII characters, etc.
I've worked up an enhancement to increase the speed of the final URL. The final URL is already hard coded into the news.google.com link. It appears its some sort of base64 decoding.
The part of the URL that has the final URL encoded is here:
/rss/articles/CBMihgFodHRwczovL3d3dy50cmFkaW5ndmlldy5jb20vbmV3cy9iaXRjb2luX2NvbTo0YTZiYjZlM2YwOTRiOjAtdGV4YXMtZWxlY3RyaWMtdXRpbGl0eS1jb3VydHMtdW5uYW1lZC1jcnlwdG8tbWluZXJzLXdpdGgtZml2ZS15ZWFyLWRlYWxzL9IBAA?oc=5,
however this has some non-ASCII characters that dont follow normal base64 encoded measures, so I made some modifications to only allow http/https in the front, remove the trailing "R" that always appears and any non-ASCII characters at the end. and to only keep the first occurrence I noticed sometimes it would repeat due to possible non-ASCII characters, etc.
Base64 Decode Sample:
"�https://www.tradingview.com/news/bitcoin_com:4a6bb6e3f094b:0-texas-electric-utility-courts-unnamed-crypto-miners-with-five-year-deals/�
Example
This will drastically speed up the prettyURL and it won't require the use of puppeteer to do so.
This full output completed in 2.4 seconds.