smasher125354 / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

HtmlParseData should hold a unique list of URLs #291

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently HtmlParseData holds a non-unique list of links in the page, meaning 
that if a URL appears in the page several times then it will appear in the list 
of links several times.

I can't think of a scenario where somebody will parse an html page and want the 
same link more than once.

We should hold a set instead of a list, thus having a unique list of links.

Original issue reported on code.google.com by avrah...@gmail.com on 21 Aug 2014 at 11:07

GoogleCodeExporter commented 9 years ago
All examples should be fixed accordingly

Original comment by avrah...@gmail.com on 21 Aug 2014 at 11:08

GoogleCodeExporter commented 9 years ago
Fixed in Revision: f5ec5157fcf4 

Original comment by avrah...@gmail.com on 21 Aug 2014 at 11:21