tasfe / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 1 forks source link

[Sitemaps] Upgrade Valid / Legal / Strict SitemapUrls #60

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
SitemapUrls can be not valid when they are referenced in a sitemap which it's 
directory is on a completely different path than the referenced SitemapUrl.

All as indicated here:
http://www.sitemaps.org/protocol.html#location

In order to clarify the validity aspect we need to upgrade the following
1. Add a little more explanations as javadocs and as logs
2. Rename "Legal" (I think only one occurrence) to "valid" (in the parser)
3. Add to the Sitemap class a new method to get all *valid* SitemapUrls
4. When dropping a URL due to invalidity a log should be shown, a URL shouldn't 
be dropped quietly.

Original issue reported on code.google.com by avrah...@gmail.com on 9 Nov 2014 at 2:10

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 25 Dec 2014 at 11:49

GoogleCodeExporter commented 9 years ago
Attaching a patch

Original comment by avrah...@gmail.com on 13 Jan 2015 at 11:21

Attachments: