Closed GoogleCodeExporter closed 9 years ago
See also
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Original comment by kkrugler...@transpac.com
on 17 Mar 2013 at 6:42
Hi Ken,
Is there anything specifically we would like to know about the use of regex URL
specifications in robots.txt from a web master POV?
I am in touch with the IT team @Scottish Parliament and it would be as good an
opportunity as any to get more info from them should we need it.
Original comment by lewis.mc...@gmail.com
on 18 Mar 2013 at 9:43
Hi Lewis - nothing comes to mind directly, though it might be interesting to
know why they want to disallow all *.htm pages.
Normally that's what you'd want to crawl, and you'd use a regex to exclude
other file types.
Original comment by kkrugler...@transpac.com
on 18 Mar 2013 at 10:50
Rolled in patch from alparslanavci (r113 and r114)
Original comment by kkrugler...@transpac.com
on 13 Mar 2014 at 11:52
Original issue reported on code.google.com by
kkrugler...@transpac.com
on 17 Mar 2013 at 6:20