Crawler does not fetch pages with rleative url

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

Apply the basic search and index script to a website with relative html
links such as www.simplyadaptive.com 

What is the expected output? What do you see instead?

After reading the first page, the crawler fails to fetch the linked pages
on the same site.

Initializing HTTPTransport ...
Failed to fetch document from url: 'http://why_us.html'.
Failed to fetch url: 'http://why_us.html': 
Initializing HTTPTransport ...
Failed to fetch document from url: 'http://benefits.html'.
Failed to fetch url: 'http://benefits.html': 
Initializing HTTPTransport ...

What version of the product are you using? On what operating system?

latest form svn

Please provide any additional information below.

Shouldn't the crawler know to transform the relative pages into pages on
the same website ?

Original issue reported on code.google.com by e.Adi.Andrei on 6 Mar 2010 at 12:17

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Well, it would be nice to have an almighty crawler. However, the focus of this
project is on intelligent algorithms. Frankly, there is a number of crawlers 
that are
far better than what Yooreeka has to offer. If you intend to crawl for 
production
purposes then you will need much more than what you will find here.

The purpose of a crawler for this project is that within the context of
the project itself, you can experiment with interjecting intelligent algorithms 
at
various stages of the crawling process. Once you find a design that satisfies 
your
objectives you can use the same algorithms in a production quality code.

The above commentary notwithstanding, I will look into the issue that you 
reported
and, if I find some time, I will try to address it.

Best regards.

Original comment by babis.ma...@gmail.com on 30 Apr 2010 at 6:18

GoogleCodeExporter commented 9 years ago

Some improvements have been made -- check out the latest distro or the trunk. 
However, we will not pursue building a full blown crawler here. Please, see 
other open source projects for that purpose.

Original comment by ba...@marmanis.com on 11 Jan 2014 at 9:03

Changed state: WontFix

shangma / yooreeka

Crawler does not fetch pages with rleative url #5