I think our site does not need any modifications as of now due to the following reasons:
Most sites which should be crawled are being crawled and indexed. This includes:
main page '/'
main agency pages '/agency/:shortname'
question pages '/questions/:id'
agency pages with tags '/agency/:shortname?tags=:tags'
Valid pages which were excluded were due to loading issues, causing Google to think that the pages are duplicates (I'm guessing due to blank screenshots). These errors include:
For pages which we do not want MOPs to see, they are currently not indexed by Google as they do not have referring pages. I think we should not include these pages in the robots.txt to explicitly disallow crawlers from crawling these pages because the robots.txt file is open for the public to view.
Define a robots.txt file to help search crawlers better understand our site and prevent them from crawling unnecessary links.
More information: https://moz.com/learn/seo/robotstxt
https://developers.google.com/search/docs/advanced/robots/create-robots-txt