saadchy / sitemap-generators

Automatically exported from code.google.com/p/sitemap-generators
0 stars 0 forks source link

Migrated feature: "Option for a recursion limit on walking directories" submitted by nobody on 2005-06-18 #14

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Original feature listed here:
http://sourceforge.net/tracker/index.php?func=detail&aid=1223364&group_id=137793
&atid=739386

The recommendations for Google are just to submit html
pages (as opposed to .gifs, .jpgs, etc) So either give an
example in the config if there is a simple way to reject
ALL else using say "regexp" for those not familiar with
regexp, or make a new switch to make it easier to pass
JUST .htm/.html. Right now we had to list and filter all
other possible extensions using the wildcard filters since
it was not acceptable to just pass ALL .htms since
there were some .htm's calls found in the logs with
parameters which we did NOT want to include.

Also it would be nice to have an option for the number of
levels walked in directories. For instance we wanted to
have our root, and only a PORTION of the subdirectories
contained in it walked. Since walking the root apparently
automatically walks ALL subdirectories, the only way
we could think to do this was to filter the rest out by
name. Would be nice to be able to specify if walking the
root included walking ALL subdirectories or only
specified ones.

Original issue reported on code.google.com by api.ma...@gmail.com on 13 Aug 2007 at 7:45

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I second the idea to add a recurse=yes/no option to the <directory> tag. I need 
to be able to specify which paths to recurse into. Directories off of the 
web-root that are not anonymous accessible have no business showing up in the 
sitemap.xml, and bots try accessing the URL's and get "access denied" type 
messages.

So today, once again, I was looking for some way to specify the web-root with 
recurse=no, and then specify each directory off of the web-root that is public, 
and have the generator recurse those directories.

Original comment by wurlit...@gmail.com on 29 Oct 2010 at 6:19