tasfe / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 1 forks source link

[Sitemaps] Add the Parser a conviniece method with only a URL argument #39

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently the Parser has two public methods to activate it, both have an 
argument with the Media Type (Content type), I suggest adding a new parsing 
method in which we will use Tika to detect the MediaType

The parsing method would be as follows:
public AbstractSiteMap parseSiteMap(URL url);

The content of this method will be something like:
byte[] bytes = IOUtils.toByteArray(onlineSitemapUrl);
String contentType = new Tika().detect(bytes);

return parseSiteMap(contentType, bytes, onlineSitemapUrl);

The new method I suggest above will be very convenient for the light user who 
only wants to parse a simple sitemap without getting into any nitty gritty - I 
believe many people will appreciate it.

Original issue reported on code.google.com by avrah...@gmail.com on 26 Apr 2014 at 7:47

GoogleCodeExporter commented 9 years ago
Here is the Patch.

Following Ken's suggestion: This patch is built above the following previous 
patch: 34.

Which means that this patch contains the patch for issue 34 and should be 
checked after checking issues 34 & 37

Original comment by avrah...@gmail.com on 21 May 2014 at 4:20

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by kkrugler...@transpac.com on 24 Jun 2014 at 2:50

GoogleCodeExporter commented 9 years ago
I tried applying this patch to the current trunk, but it wasn't clean. It looks 
like Lewis's merge of your slf4j changes invalidated this patch. It would be 
great if you could re-generate the patch from trunk, sorry.

Original comment by kkrugler...@transpac.com on 24 Jun 2014 at 3:00

GoogleCodeExporter commented 9 years ago
Sure, I can and I will regenerate a clean patch

Original comment by avrah...@gmail.com on 24 Jun 2014 at 4:15

GoogleCodeExporter commented 9 years ago
Added a clean Patch with this issue fix.

I have also solved a small TODO waiting for a library upgrade which has already 
been done in a previous commit

Original comment by avrah...@gmail.com on 30 Jun 2014 at 6:00

Attachments:

GoogleCodeExporter commented 9 years ago
Committed @ revision 131 in trunk
Thank you Avi for this patch :)

Original comment by lewis.mc...@gmail.com on 7 Jul 2014 at 2:28