joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.77k stars 3.65k forks source link

[5.3][SEF] Elimination of duplicate pages, SEO improvements: redirects for IDs and www, canonical links #44310

Open universewrld opened 1 month ago

universewrld commented 1 month ago

Is your feature request related to a problem? Please describe.

Even though Joomla 5.2 has some new settings to eliminate duplicate pages, there are still https://github.com/joomla/joomla-cms/issues/44263 duplicate pages in the CMS. I suggest reducing the number of duplicate pages in @Joomla.

Describe the solution you'd like

New options for the System - SEF plugin:

  1. Redirect pages with ID (articles, categories) to the version of pages without ID
  2. Redirect from the version of a page with www to the version of a page without www (or vice versa)
  3. Canonical links for pagination pages (for pages like ?start=10)

What Joomla has already done to reduce duplicate pages:

Additional context

This will help eliminate almost all duplicate pages for search engines. The fewer duplicate pages processed by search robots like @Google, @google-gemini, @microsoft, @openai, etc, the less energy will be released by data processing centers, the less effect will be on the environment and climate change.

What is canonicalization - https://developers.google.com/search/docs/crawling-indexing/canonicalization Redirects and Google Search - https://developers.google.com/search/docs/crawling-indexing/301-redirects How to specify a canonical URL with rel="canonical" and other methods - https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls

fgsw commented 1 month ago

Duplicate of #44263

universewrld commented 1 month ago

Duplicate of #44263

this is not a duplicate, this is a feature request. the previous issue was a bug report. do you see the difference?

you can close the previous issue https://github.com/joomla/joomla-cms/issues/44263, but not this one.

richard67 commented 1 month ago
  1. Redirect from the version of a page with www to the version of a page without www

What if I want it vice versa, redirect non www to www?

universewrld commented 1 month ago
  1. Redirect from the version of a page with www to the version of a page without www

What if I want it vice versa, redirect non www to www?

yes, that's what I meant. there should be an option for WWW:

Mich-es commented 3 weeks ago

Hii - Huge Problem with J5.2

Google has a problem with J5.2. Joomla now appends a rel=‘canonical’ to every crap page and Google no longer knows what the original is. This is a serious bug and should be fixed as soon as possible. On a site with around 12,000 URLs, I have 4,800 duplicate content pages with rel=‘canonical’ in 3 days. This must be solved with J5.2.1 and not with J5.3!

Greetings Mitches

simbus82 commented 3 weeks ago

Canonical links for pagination pages (for pages like ?start=10)

Brrr, this is a really wrong approach. Screenshot_20241030-221113.jpg

You should canonicalise paginated pages only to a "view all" page, not to a page that show only a limited number of child pages.

universewrld commented 2 weeks ago

Canonical links for pagination pages (for pages like ?start=10)

Brrr, this is a really wrong approach. Screenshot_20241030-221113.jpg

You should canonicalise paginated pages only to a "view all" page, not to a page that show only a limited number of child pages.

the image shows that all pages with pagination such as /blog/page2, /blog/page3, etc. should specify /blog as the canonical first page.

In @Joomla this would look like pages like /blog?start=10 and /blog?start=25 would point to /blog as the canonical blog page.

simbus82 commented 2 weeks ago

It's the same, forgive my bluntness, but these are basic SEO concepts.

The "/blog" page on a Joomla site is typically already limited in the number of items (e.g., it shows intros to the latest 10 blog posts), so it's NOT suitable to become the canonical for a "/blog?start=10".

If you set a canonical tag that always points to /blog, you're telling search engines that all paginated pages (/blog?start=10, /blog?start=20, etc.) are identical to /blog. This is absolutely incorrect and causes a host of issues, including:

Loss of Content Indexing

Search engines will ignore pages after the first one (/blog, which shows ONLY 10 links to the underlying posts), because the canonical tag says that all paginated content (whether you like it or not, ?start=10 is pagination) is a duplicate of the initial page. In practice, search engines would only see the first 10 articles in the blog section, overlooking content on subsequent pages.

Redistribution of Link Juice to a Limited URL, Resulting in Loss of Link Juice to Posts Beyond the Tenth

When different pages all point to a single URL via an incorrect canonical (e.g., all paginated blog pages point to /blog), search engines concentrate the link juice on the declared canonical URL (/blog), ignoring the other pages. As a result, the paginated pages (like /blog?start=10, /blog?start=20, etc.) lose their individual authority and fail to pass link juice to the posts or content within or beneath those pages.

Sitemap?

And even if you submit all blog post links in a sitemap to Google or other search engines, this doesn’t automatically guarantee that those links will be indexed correctly or receive the right amount of link juice (authority) if the canonical tag isn’t set up properly.

The sitemap is merely a list that helps search engines discover your URLs, but it doesn’t determine which version of a URL is considered the primary one, nor how internal links pass authority (link juice) within the site. If you have an incorrect canonical setup (for example, all paginated pages pointing to /blog as the canonical), you’re telling search engines that only the canonical page ("/blog") is the main version of all content.

universewrld commented 2 weeks ago

What is canonicalization

Canonicalization is the process of selecting the representative –canonical– URL of a piece of content. Consequently, a canonical URL is the URL of a page that Google chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Google show only one version of the otherwise duplicate content in its search results.

There are many reasons why a site may have duplicate content:

SOURCE: https://developers.google.com/search/docs/crawling-indexing/canonicalization

@simbus82 you are completely wrong! Pages with pagination are duplicate content and @Google says so! This is a duplicate of content that is related to the site's function and is indicated in the official Google Help!

Image

simbus82 commented 2 weeks ago

You really have no understanding of what pagination is and the context in which it should be managed or not with canonical.

https://developers.google.com/search/blog/2013/04/5-common-mistakes-with-relcanonical Image Image

https://developers.google.com/search/docs/specialty/ecommerce/pagination-and-incremental-page-loading Image

https://yoast.com/rel-canonical/ Image

https://cognitiveseo.com/blog/19204/canonical-urls-seo/ Image

https://searchengineland.com/pagination-strategies-in-the-real-world-81204 Image

But think what you want, I have no gain from wasting time convincing you, about this so basic thing for a SEO Junior. I hope the team is more competent than you and does not approve this nonsense.

I'm alwasy happy to give my help, the result of 20 years of web development and digital marketing, and over 10 years of SEO Specialist stuff for 6-figure projects.

You need to start studying, you're really a newbie in SEO. I recommend The Art of SEO book.

Hackwar commented 3 days ago

The redirect from ID to no-ID is waiting to be tested with #44455 and #44477.

universewrld commented 3 days ago

You really have no understanding of what pagination is and the context in which it should be managed or not with canonical.

you confused view single article which has multiple pages and view category blog.

page 1, page 2, page 3 etc view category blog should not be canonical pages.

@simbus82 I've been doing SEO for almost 20 years and I know a lot more about it than you do.. I'm not here as a developer, I'm here as a website owner. I do not sell extensions, I promote websites.

universewrld commented 3 days ago

The redirect from ID to no-ID is waiting to be tested with #44455 and #44477.

thanks! i added your links to my first post.

simbus82 commented 3 days ago

You really have no understanding of what pagination is and the context in which it should be managed or not with canonical.

you confused view single article which has multiple pages and view category blog.

page 1, page 2, page 3 etc view category blog should not be canonical pages.

@simbus82 I've been doing SEO for almost 20 years and I know a lot more about it than you do.. I'm not here as a developer, I'm here as a website owner. I do not sell extensions, I promote websites.

"A lot more about it than you do", ok. Contact me, we can talk about it.

universewrld commented 3 days ago

we can talk about it.

Image

If you think that these links to page 1, page 2, etc. should be marked as canonical, then there is something wrong with you and your SEO skills.

universewrld commented 3 days ago

@simbus82 here's an example based on Joomla! Blogs:

canonical links: https://community.joomla.org/blogs.html - home page blog view https://community.joomla.org/blogs/community/were-back.html - article pages of this blog

NON-CANONICAL LINKS: https://community.joomla.org/blogs.html?start=5 - page 2 view blog category https://community.joomla.org/blogs.html?start=10 - page 3 view blog category https://community.joomla.org/blogs.html?start=15 - page 4 view blog category etc

non-canonical links in this blog category should mark as canonical - the main page of this blog: https://community.joomla.org/blogs.html

@simbus82 should I give you a series of lessons in SEO or are you able to figure it all out yourself and read the information about SEO links more carefully?

brianteeman commented 3 days ago

@universewrld I am in no way an expert on any of this. But all the content posted by @simbus82 https://github.com/joomla/joomla-cms/issues/44310#issuecomment-2457921422 from sources I absolutely trust say that you are wrong

simbus82 commented 3 days ago

Last

@simbus82 here's an example based on Joomla! Blogs:

canonical links: https://community.joomla.org/blogs.html - home page blog view https://community.joomla.org/blogs/community/were-back.html - article pages of this blog

NON-CANONICAL LINKS: https://community.joomla.org/blogs.html?start=5 - page 2 view blog category https://community.joomla.org/blogs.html?start=10 - page 3 view blog category https://community.joomla.org/blogs.html?start=15 - page 4 view blog category etc

non-canonical links in this blog category should mark as canonical - the main page of this blog: https://community.joomla.org/blogs.html

@simbus82 should I give you a series of lessons in SEO or are you able to figure it all out yourself and read the information about SEO links more carefully?

Think what you want; unfortunately, you simply don’t understand what a canonical is and what it’s for. If, for you, the content and (especially) the internal links of this page https://community.joomla.org/blogs.html?start=10 are IDENTICAL to those of this one https://community.joomla.org/blogs.html, and therefore you want to tell the search engine to NOT CONSIDER what’s on page https://community.joomla.org/blogs.html?start=10 but ONLY consider what’s on a page like this https://community.joomla.org/blogs.html, which will thus be the CANONICAL (Definition: A canonical URL is the URL of the page deemed most representative among a set of pages detectable as duplicates within a site), go ahead and good luck!

Unbelievable, you want to lecture me, hiding behind an anonymous nickname, spouting unmatched nonsense even after I provided (not that it was even necessary) the most authoritative global external sources on the matter.

If you want to talk to me, you can contact me on Linkedin, to talk as equals, between professionals, without hiding behind any anonymous alias. I won't answer you here anymore, I've already wasted too much time arguing about obvious things recognized worldwide.

Just for fun, let's see what ChatGPT thinks of this thread.

Image

Mich-es commented 3 days ago

Don't be put off - test it! I used ‘Aimy Canonical’ after the paginated disaster out of necessity and set the paginated pages as canonical.

The explanations by @simbus82 imbus82 are correct and are also recommended by Google

Image

universewrld commented 3 days ago

Just for fun, let's see what ChatGPT thinks of this thread.

all your knowledge is to refer to the chat bot ChatGPT, this is your real level + articles from 2013 that you indicated, but did not indicate articles from @Google for 2024.

you have no knowledge, the chat bot replaced your brain and you just confirmed it yourself.

I have been doing SEO for almost 20 years and brought websites to the Top 1 in Google for high-frequency queries. You have no level of analytics, you should not deceive people that you understand at least something in SEO. SEO is not a theory, but a practice, practice on live websites, and not outdated documentation from 2013, which was wrong.

you don't even understand how search engines and search bots like Google bot or ChatGPT work, you are trying to prove to me that you know these things better than me, but you are a DEVELOPER, and I'm an SEO specialist, I promote websites, I write articles, not you.

Any SEO expert will tell you that Google does not need pages like "Blog Category - Page 1", "Blog Category - Page 2", "Blog Category - Page 3" in their search.

You really don't understand how Google collects information from all these websites and how their search engine works. This thread is literally full of criminally erroneous opinions from you about how SEO works. On any SEO forum, they will very quickly explain to you that you are wrong in everything you know about SEO.

universewrld commented 3 days ago

@universewrld I am in no way an expert on any of this. But all the content posted by @simbus82 #44310 (comment) from sources I absolutely trust say that you are wrong

you are confusing <noindex> with rel="canonical" links.

Block Search indexing with noindex - https://developers.google.com/search/docs/crawling-indexing/block-indexing

if links like Blog - Page 1, Blog - Page 2 and Blog - Page 3 had the <noindex> tag, then the content on those pages would not be indexed, but page Blog - Page 1, Blog - Page 2 and Blog - Page 3 do not have the <noindex> tag, so all content there will be indexed.

Instead of Blog - Page 1, Blog - Page 2 and Blog - Page 3 in Google search there will be the main page of the category Blog view.

universewrld commented 3 days ago

if you want to prohibit indexing of website pages, you can do it via noindex, nofollow and robots.txt file, but not via rel="canonical".

rel="canonical" only points to the main, canonical page, rel="canonical" does not prohibit indexing of other pages of your website.

some people in this thread have confused all SEO terms and definitions and are trying to pass themselves off as SEO experts.

fgsw commented 3 days ago

can this be locked for now?

Hackwar commented 3 days ago

Indeed. if you want to argue about SEO, feel free to meet in a forum or chat of your choice, but the issue tracker of Joomla is not the place to do this. I'm locking this topic for now.