mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

missing links in the url list during the crawling process #82

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
when crawling into pages such as this one:
http://steamcommunity.com/games/CSS/members
the links that appears in the browser, at the bottom right corner of the page, 
which allows you to advance into the next pages, are not part of the links in 
the html source.

as you can see in the above link, those pages can supply many links which will 
make crawling very thorough, so it would be a shame to miss them.

can you suggest a way to adjust your source in order to add such functionality?

Original issue reported on code.google.com by yifatcher@gmail.com on 21 Sep 2011 at 11:15

GoogleCodeExporter commented 9 years ago
Looking at the html of the page, it looks like they are in fact part of the 
page. So, I don't understand your problem:

Page:1  <a href="?p=2">2</a>  <a href="?p=3">3</a> ... <a 
href="?p=6054">6054</a>   <a href="?p=2">>></a>

-Yasser

Original comment by ganjisaffar@gmail.com on 22 Sep 2011 at 4:20

GoogleCodeExporter commented 9 years ago

Original comment by ganjisaffar@gmail.com on 25 Dec 2011 at 9:12