Bots index every url, even if its duplicate.

grandnode / grandnode2

Open-Source eCommerce Platform on .NET Core, MongoDB, AWS DocumentDB, Azure CosmosDB, LiteDB & Vue.js

https://grandnode.com/

GNU General Public License v3.0

1.05k stars 431 forks source link

Bots index every url, even if its duplicate. #441

Closed Macka323 closed 7 months ago

Macka323 commented 8 months ago

By duplicate I mean that the category for products can be viewed in different sort orders, but the contents of the page are the same.

One way to fix it would be to disallow the bots in Robots.txt to crawl links with those parameters. For example:

Disallow: /*?viewmode=
Disallow: /*?orderby=
Disallow: /*?pagesize=

Another option is to use canonical tags on the pages. We can specify the canonical URL without query parameters to indicate which version should be indexed. For example:


<link rel="canonical" href="https://www.example.com/page">

nguyendev commented 8 months ago

Yes. I like this solution: <link rel="canonical" href="https://www.example.com/page">

KrzysztofPajak commented 8 months ago

@nguyendev Using a canonical tag is okay, but if you have product specifications that are used for filtering, then addresses such as 'https://demo.grandnode.com/computers?cpu-type=intel-core-i7' will not be included

nguyendev commented 8 months ago

You're right. I forgot about that😿. So why don't I set up dynamic canonical? For example, it is like a rule table. Which ones are canoical, which ones are not allowed? Likewise with robots.txt

KrzysztofPajak commented 8 months ago

It will be easier to add changes to robots.txt Disallow: /*?viewmode=...