opengovsg / askgovsg

Answers from the Singapore Government
https://ask.gov.sg
Other
16 stars 6 forks source link

Optimise robots.txt file #402

Open LinHuiqing opened 3 years ago

LinHuiqing commented 3 years ago

Define a robots.txt file to help search crawlers better understand our site and prevent them from crawling unnecessary links.

More information: https://moz.com/learn/seo/robotstxt

https://developers.google.com/search/docs/advanced/robots/create-robots-txt

LinHuiqing commented 2 years ago

I think our site does not need any modifications as of now due to the following reasons:

  1. Most sites which should be crawled are being crawled and indexed. This includes:
    • main page '/'
    • main agency pages '/agency/:shortname'
    • question pages '/questions/:id'
    • agency pages with tags '/agency/:shortname?tags=:tags'
  2. Valid pages which were excluded were due to loading issues, causing Google to think that the pages are duplicates (I'm guessing due to blank screenshots). These errors include:
  3. For pages which we do not want MOPs to see, they are currently not indexed by Google as they do not have referring pages. I think we should not include these pages in the robots.txt to explicitly disallow crawlers from crawling these pages because the robots.txt file is open for the public to view.