stephenou / fruitionsite

Build your website with Notion for free
https://fruitionsite.com
MIT License
1.59k stars 218 forks source link

Google indexes Template, Helpcenter, and Notion Landing Pages on my Domain #63

Open marcklingen opened 3 years ago

marcklingen commented 3 years ago

Hi @stephenou,

thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.

Yesterday, I saw that Google indexed 374 pages on my domain labeled as Indexed, not submitted in sitemap. Here's a screenshot with examples:

image

While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.

I am happy to contribute to the solution, let me know how I can help best.

stephenou commented 3 years ago

Hey @marcklingen! Yep, I agree we should fix this. The solution I came up with is to 301-redirect unlisted pages back to the homepage.

See https://github.com/stephenou/fruitionsite/commit/7f273edcd4f7ea4707663b3209b9d1eb05aac0bb for what changes you need to make in your script.

Do you mind helping me test it out before I announce it widely? Thanks!

marcklingen commented 3 years ago

Thanks @stephenou, the redirect works for me. I'd suggest to also add X-Robots-Tag: noindex to the header of the response.

marcklingen commented 3 years ago

Just found an exception to the rule in the screenshot. While this solution solves most of the problem, it does not address pages which do not have the characteristic page id, e.g.: /tools-and-craft/01-andy-hertzfeld, /pricing

ThallyssonKlein commented 3 years ago

@marcklingen How did you manage to index the domain on Google? Mine does not appear ...

marcklingen commented 3 years ago

@ThallyssonKlein If you domain is not automatically indexed, you can add the /sitemap.xml in the Search Console.

ThallyssonKlein commented 3 years ago

@marcklingen Where does this sitemap.xml file come from?

marcklingen commented 3 years ago

@ThallyssonKlein The sitemap is generated by the worker, you can find the line here: https://github.com/stephenou/fruitionsite/blob/7f273edcd4f7ea4707663b3209b9d1eb05aac0bb/worker.js#L91

lasharor commented 3 years ago

Had the same problem. Changed the code as per 7f273ed. Pages indexed by google now all go back to the homepage.

marcklingen commented 3 years ago

@lasharor That does not work for pages with a nice slug such as /pricing

ThallyssonKlein commented 3 years ago

image

@marcklingen How long can indexing take?

marcklingen commented 3 years ago

@ThallyssonKlein Usually it takes a couple of days but less than a week.

ThallyssonKlein commented 3 years ago

@marcklingen I have been notified that the pages are not being read correctly

image

stephenou commented 3 years ago

Just found an exception to the rule in the screenshot. While this solution solves most of the problem, it does not address pages which do not have the characteristic page id, e.g.: /tools-and-craft/01-andy-hertzfeld, /pricing

Hey Marc, yeah unfortunately that's true. One solution is to define a denylist of URLs that Notion uses for marketing, but it's hard to keep it updated when Notion adds new pages. Another solution is to define an allowlist of URLs that your site can visit, but it's also hard to keep it updated when you add new pages.

vlafriday commented 1 year ago

Hi @stephenou,

thanks again for your ongoing development and support of this project! While #19 #18 aim to fix this issue, I just would like to reiterate why it would be super important to exclude all pages that are not whitelisted with a 404 or no-index header.

Yesterday, I saw that Google indexed 374 pages on my domain labeled as Indexed, not submitted in sitemap. Here's a screenshot with examples: image

While I am not sure about the legal implications, it would be for sure nice to only have own pages in the index.

I am happy to contribute to the solution, let me know how I can help best.

Hi, @marcklingen, I know it's been many years, but maybe you know the solution? Now I get the same problem with indexing. Although sitemap.xml has been added to Google Console.

I see your website (https://marcklingen.com/) it works without problems. How did you do that? Can you share and help?

Screenshot_35 Screenshot_36

@stephenou

marcklingen commented 1 year ago

Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.

vlafriday commented 1 year ago

Do you have good indexing in google console? Does it work? Are you using Vercel to deploy code?

Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.

marcklingen commented 1 year ago

Do you have good indexing in google console? Does it work? Are you using Vercel to deploy code?

Hi @vlafriday, I switched to react-notion-x for this page while still using fruition for others. Based on the screenshot (which I do not fully understand) your problem looks different to the one I had as you have a 404 and not too many pages in the index.

Yes, works well for me and I deploy on Vercel. You can also go for managed solutions like super.so, have not tried them though.