Open marcklingen opened 3 years ago
Hey @marcklingen! Yep, I agree we should fix this. The solution I came up with is to 301-redirect unlisted pages back to the homepage.
See https://github.com/stephenou/fruitionsite/commit/7f273edcd4f7ea4707663b3209b9d1eb05aac0bb for what changes you need to make in your script.
Do you mind helping me test it out before I announce it widely? Thanks!
Thanks @stephenou, the redirect works for me. I'd suggest to also add X-Robots-Tag: noindex
to the header of the response.
Just found an exception to the rule in the screenshot. While this solution solves most of the problem, it does not address pages which do not have the characteristic page id, e.g.: /tools-and-craft/01-andy-hertzfeld
, /pricing
@marcklingen How did you manage to index the domain on Google? Mine does not appear ...
@ThallyssonKlein If you domain is not automatically indexed, you can add the /sitemap.xml
in the Search Console.
@marcklingen Where does this sitemap.xml
file come from?
@ThallyssonKlein The sitemap is generated by the worker, you can find the line here: https://github.com/stephenou/fruitionsite/blob/7f273edcd4f7ea4707663b3209b9d1eb05aac0bb/worker.js#L91
Had the same problem. Changed the code as per 7f273ed. Pages indexed by google now all go back to the homepage.
@lasharor That does not work for pages with a nice slug such as /pricing
@marcklingen How long can indexing take?
@ThallyssonKlein Usually it takes a couple of days but less than a week.
@marcklingen I have been notified that the pages are not being read correctly
Just found an exception to the rule in the screenshot. While this solution solves most of the problem, it does not address pages which do not have the characteristic page id, e.g.:
/tools-and-craft/01-andy-hertzfeld
,/pricing
Hey Marc, yeah unfortunately that's true. One solution is to define a denylist of URLs that Notion uses for marketing, but it's hard to keep it updated when Notion adds new pages. Another solution is to define an allowlist of URLs that your site can visit, but it's also hard to keep it updated when you add new pages.
Hi @stephenou,
thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.
Yesterday, I saw that Google indexed 374 pages on my domain labeled as
Indexed, not submitted in sitemap
. Here's a screenshot with examples:While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.
I am happy to contribute to the solution, let me know how I can help best.
Hi, @marcklingen, I know it's been many years, but maybe you know the solution? Now I get the same problem with indexing. Although sitemap.xml has been added to Google Console.
I see your website (https://marcklingen.com/) it works without problems. How did you do that? Can you share and help?
@stephenou
Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.
Do you have good indexing in google console? Does it work? Are you using Vercel to deploy code?
Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.
Do you have good indexing in google console? Does it work? Are you using Vercel to deploy code?
Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.
Yes, works well for me and I deploy on Vercel. You can also go for managed solutions like super.so, have not tried them though.
Hi @stephenou,
thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.
Yesterday, I saw that Google indexed 374 pages on my domain labeled as
Indexed, not submitted in sitemap
. Here's a screenshot with examples:While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.
I am happy to contribute to the solution, let me know how I can help best.