opensangja / abot

Automatically exported from code.google.com/p/abot
Apache License 2.0
0 stars 0 forks source link

Implement robots no follow #75

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Implement robots no follow on the PoliteWebCrawler to respect the two tags 
below.

Do not crawl any links on the page if they have the following...
<meta name="robots" content="nofollow" />

Do not crawl the individual links marked up as follows...
<a href="signin.php" rel="nofollow">sign in</a>

Original issue reported on code.google.com by sjdir...@gmail.com on 1 Mar 2013 at 1:02

GoogleCodeExporter commented 9 years ago

Original comment by sjdir...@gmail.com on 1 Mar 2013 at 1:06

GoogleCodeExporter commented 9 years ago

Original comment by sjdir...@gmail.com on 29 Jun 2013 at 7:32

GoogleCodeExporter commented 9 years ago
Working on this.

Original comment by ilushk...@gmail.com on 20 Jul 2013 at 1:09

GoogleCodeExporter commented 9 years ago
Can you take a look at my latest commit for this in 1.2.2  I implemented the 
first part of this in hap and cs link parsers.... but we need a way to pass in 
whether it should check for this or not.  How do you recommend doing it?

Original comment by ilushk...@gmail.com on 20 Jul 2013 at 2:05

GoogleCodeExporter commented 9 years ago
I added the following to CSQueryHyperlinkParser and HapHyperlinkParser which 
are loaded from new config values in the app/web.config files...

bool _isRespectMetaRobotsNoFollowEnabled;
bool _isRespectAnchorRelNoFollowEnabled;

These values are filled from the constructor and the crawler 

I believe this is what you needed. Right?

Original comment by sjdir...@gmail.com on 21 Jul 2013 at 11:44

GoogleCodeExporter commented 9 years ago
Yup this is exactly it.  Will wrap this up now.

Original comment by ilushk...@gmail.com on 22 Jul 2013 at 1:35

GoogleCodeExporter commented 9 years ago
Now abot has config values IsRespectRobotsNoFollow and 
IsRespectAnchorRelNoFollow which drive this functionality. This was checked 
into 1.2.3 on github

Original comment by sjdir...@gmail.com on 3 Sep 2013 at 1:44