Closed rockdaboot closed 10 years ago
Mget now supports sitemap index files and sitemap files in 'sitemap' format (gzip compressed and uncompressed) and in plain text format. Snanning of RSS and Atom feed formats for sitemap files and within HTML will be supported soon.
Added parsing RSS 2.0 and Atom 1.0 feeds.
Download sitemap urls from robots.txt (zipped and unzipped). Parse these files with Mgets XML parser to fetch all urls. Respect additional information/schemas from 'urlset', e.g.http://www.google.com/schemas/sitemap-image/1.1.
See http://www.sitemaps.org/protocol.html for more information.