nystudio107 / craft-seomatic

SEOmatic facilitates modern SEO best practices & implementation for Craft CMS 3. It is a turnkey SEO system that is comprehensive, powerful, and flexible.
https://nystudio107.com/plugins/seomatic
Other
162 stars 68 forks source link

Disable Seomatic by type #1400

Closed samuelreichor closed 5 months ago

samuelreichor commented 5 months ago

Hi guys, We just had the problem that a newsletter was indexed with secret information and was therefore public. Of course this is an error that can easily be fixed by removing all campaign channels from the search index and setting them to noindex, but I think it would be cool if there was a way to avoid this error from the start and exclude the whole type directly in the Seomatic config via code.

Love, Samuel

khalwat commented 5 months ago

What whole type are you proposing to exclude by default in SEOmatic?

Are you saying you want any Campaign page to not be indexable by default, or...?

samuelreichor commented 5 months ago

Yes that would be very nice. Also all pages with campaign type should be excluded in sitemaps by default, when this is possible.

bencroker commented 5 months ago

I’m not sure that Campaign Types should be disabled by default. It might make sense for your use-case, but does it make sense as the default for everyone?

samuelreichor commented 5 months ago

In my opinion, newsletters should not be part of a website and should therefore not be indexed or included in the sitemap by default. I would already be happy if I could prevent this via the config. Then it can go into the project template and be forgotten

bencroker commented 5 months ago

I 100% agree with the value of having this exist as a config setting!

khalwat commented 5 months ago

Well, it's not going to be a config setting; that doesn't really make sense in the context of the way SEOmatic works.

However, I can easily make it default to robots="none" for Campaign sections... but there are some caveats here:

So at least IMO, I think the issue here is simply that you had a newsletter that had public URLs, which means Google and other things can and will find them. If you don't want that to happen, then turn off the public URLs for the Campaign section.

If you have an edge case where you want the newsletter URLs to be public, but don't want them indexed, then simply set the Robots setting as appropriate in SEOmatic -> Content SEO for the Campaign section(s) in question.

TL;DR I don't love the idea of defaulting Campaign URLs to not be indexable, and facilities already exist for doing what you want, so I'm not sure adding a config setting makes sense.

Thoughts?

bencroker commented 5 months ago

My thinking was that a new config setting could be introduced that disables indexing and inclusion in the sitemap for new sections (and campaign types) by default. This could potentially be split into element types, giving @samuelreichor the ability to disable it for new campaign elements only.

khalwat commented 5 months ago

Well, but that's not a new config setting @bencroker -- that already exists. Here is the global tag config for the robots tag:

https://github.com/nystudio107/craft-seomatic/blob/develop-v4/src/seomatic-config/globalmeta/TagContainer.php#L60

This particular tag, or any tag, could be overriden by the Element-specific tags as well, but in this case, it's set to an SEOmatic variable seomatic.meta.robots, and the Campaign config defines that here:

https://github.com/nystudio107/craft-seomatic/blob/develop-v4/src/seomatic-config/campaignmeta/GlobalVars.php#L30

...and you also can override any of these defaults in your project now:

https://nystudio107.com/blog/tips-for-using-seomatic-effectively#customized-setup

...by providing your own default config, which can be whatever you want. So @samuelreichor can do what he wants to do already without anything being changed in the SEOmatic codebase.

And if you really wanted to, you could also leverage this for disabling the indexing of all new sections. I really wouldn't recommend it, though.

There's pretty much zero chance that I'm going to bake that into SEOmatic, though, because based on the support I do, I can guarantee you that far more people will complain that they had to take action for something to be indexed than the reverse. It's just expected behavior.

khalwat commented 5 months ago

I would listen to an argument that for Campaign in particular, the default robots setting is none instead of all, which again would be a simple config file change.

...and again, if SEOmatic was never installed in the scenario described in this issue, the same thing would have ended up happening: Google would have indexed the publically available URLs for the Campaign newsletter. See: https://nystudio107.com/blog/seo-myths-top-5-sitemap-myths-demystified

So it's really about SEOmatic adding a layer of protection that wouldn't have been there otherwise be present, which is fair to consider as an enhancement.

However, my reservations are:

But I could be convinced if you believe it's the case, based on your knowledge of the people who use Campaign, that it is more common that they would make a newsletter available via a public URL, but would not want it indexed vs. wanting it indexed.

It seems to me that it'd be far more common that people do want their high-value newsletter content indexed if they've made the URLs public than not. And it'd be internally consistent without how SEOmatic handles all other section types.

bencroker commented 5 months ago

Great, it looks like you’re all set then, @samuelreichor?

I never suggested disabling indexing or inclusion in the sitemap by default, nor do I think it should be. I was just trying to help figure out how @samuelreichor could do this for this specific use-case. Thanks for clarifying how!

khalwat commented 5 months ago

Oh okay, sorry I misunderstood then @bencroker

But yes, the facility is there for him to override it in exactly the way he wants via the seomatic-config. It merges any custom config it finds with the default config, so you can override just what you want to override.

samuelreichor commented 5 months ago

@bencroker & @khalwat Thank you very much for your help!! I appreciate it a lot