Open stklik opened 2 years ago
Update, I added a few lines to the script to make sure 'robots.txt' returns a few disallowed pages:
if (url.pathname === "/robots.txt") {
return new Response(`
User-agent: *
Disallow: /api/
Disallow: /blog/
Disallow: /community/
Disallow: /customers/
Disallow: /guides/
Disallow: /help/
Disallow: /pages/
Disallow: /releases/
Disallow: /startups/
Disallow: /templates/
Disallow: /webinars/
Disallow: /wikis/
Disallow: /wiki/
Sitemap: https://${MY_DOMAIN}/sitemap.xml
`);
The URLs are still active though :-/ so it's not a fix. Still hoping somebody can point me to a real solution.
I noticed that #162 has a similar workaround solution (except for it's blocking them) Workaround is in this comment: https://github.com/stephenou/fruitionsite/issues/162#issuecomment-1011276075
TL;DR https://fruitionsite.com/blog serves https://notion.so/blog
When calling a URL that does not exist in the page (e.g. /blog or /customers) the worker directly fetches and serves notion.so's website.
A (bad) quick fix for basic URL paths is to add them to the
SLUG_TO_PAGE
and make sure that /blog is intercepted and forwarded to something I prefer:'blog': 'my_base_hash'
However, it evidently does not work as URL prefix matcher for subpage URLs. This means that even if you add'customers': 'a_page_hash'
toSLUG_TO_PAGE
, the script will intercept https://fruitionsite.com/customers (as expected), but forward https://fruitionsite.com/customers/boxed . Thus, currently an exhaustive blocking of notion.so's URLs does not work (well... you could technically constantly monitor for addition of a new URL route by notion.so...)Has anybody a smarter way of blocking forwards to the notion.so pages? Two solutions come to mind:
suggestions / opinions / solutions welcome...