issues
search
mohankreddy
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
runtime exception
#99
GoogleCodeExporter
closed
9 years ago
1
URLs with HTTPS Not Fetched
#98
GoogleCodeExporter
closed
9 years ago
5
Not all domain URLs are crawled
#97
GoogleCodeExporter
closed
9 years ago
1
crawler for gbk html page
#96
GoogleCodeExporter
closed
9 years ago
1
Resuming Enabled Large Seed List Takes Forever
#95
GoogleCodeExporter
opened
9 years ago
3
shouldVisit list of domain to crawl
#94
GoogleCodeExporter
closed
9 years ago
4
[deleted issue]
#93
GoogleCodeExporter
closed
9 years ago
0
how to access crawled data
#92
GoogleCodeExporter
closed
9 years ago
2
Resumable Crawler ignores the maxPagesToFetch
#91
GoogleCodeExporter
closed
9 years ago
1
Fatal transport error
#90
GoogleCodeExporter
closed
9 years ago
2
Null pointer when trying to start the crawler controller several times (One after another)
#89
GoogleCodeExporter
closed
9 years ago
2
How to crawl a pasword protected website. Can you provide some samples fro the same where authentication is involved
#88
GoogleCodeExporter
closed
9 years ago
22
java.lang.IllegalThreadStateException occured while reconstructing CrawlController
#87
GoogleCodeExporter
closed
9 years ago
2
Crawler not found window.location url
#86
GoogleCodeExporter
opened
9 years ago
5
Dealing with URLs that have Session ID in query string
#85
GoogleCodeExporter
closed
9 years ago
1
Impossible to use crawler with a constructor that has multiple argument / Crawlcontroller only supports the default constructor
#84
GoogleCodeExporter
closed
9 years ago
2
Multiple start and close the search session
#83
GoogleCodeExporter
closed
9 years ago
2
missing links in the url list during the crawling process
#82
GoogleCodeExporter
closed
9 years ago
2
Exception --> Can't open a cursor Database was closed
#81
GoogleCodeExporter
closed
9 years ago
4
Crawler crawls javascript and css files with ? at the end of the url
#80
GoogleCodeExporter
closed
9 years ago
3
WARNING: Could not find crawler4j.properties file in class path.
#79
GoogleCodeExporter
closed
9 years ago
7
Discard robots.txt files that are no plain text
#78
GoogleCodeExporter
closed
9 years ago
1
Reading diffrent Base URL to use as Seed, to crawler in a loop
#77
GoogleCodeExporter
closed
9 years ago
1
Passing arguments to webcrawler
#76
GoogleCodeExporter
closed
9 years ago
4
Exchangable robots.txt stores
#75
GoogleCodeExporter
opened
9 years ago
2
Accept-Language header
#74
GoogleCodeExporter
opened
9 years ago
3
Unnecessary fetching of robots.txt files
#73
GoogleCodeExporter
closed
9 years ago
2
What is the use of Berkley DB here?
#72
GoogleCodeExporter
closed
9 years ago
3
java.lang.NullPointerException
#71
GoogleCodeExporter
closed
9 years ago
5
[deleted issue]
#70
GoogleCodeExporter
closed
9 years ago
0
Requests Per Second Per Host
#69
GoogleCodeExporter
opened
9 years ago
7
Tow CrawlController instance
#68
GoogleCodeExporter
closed
9 years ago
2
Fatal error in JVM
#67
GoogleCodeExporter
closed
9 years ago
1
NoHttpResponseException
#66
GoogleCodeExporter
closed
9 years ago
1
EnvironmentFailureException
#65
GoogleCodeExporter
closed
9 years ago
4
can I delete the jdb files in folder frontier while the crawler is running?
#64
GoogleCodeExporter
closed
9 years ago
2
[deleted issue]
#63
GoogleCodeExporter
closed
9 years ago
0
Need to grab a link from onclick()js code and craw it. Made url out of js code, called page.setURLs() in visit(Page page) not working.
#62
GoogleCodeExporter
closed
9 years ago
3
MakeCrawlerJ distributed
#61
GoogleCodeExporter
opened
9 years ago
5
Where is manual? Please write some simply steps to do
#60
GoogleCodeExporter
closed
9 years ago
6
Crawler ignores robots meta-tag from the page
#59
GoogleCodeExporter
opened
9 years ago
4
Crawler ignores Crawl-delay from the host's robots.txt
#58
GoogleCodeExporter
opened
9 years ago
7
page.isBinary() returns false for .pdf and .doc files (probably any other too)
#57
GoogleCodeExporter
closed
9 years ago
3
Problem with restart crawler with difference seeds
#56
GoogleCodeExporter
closed
9 years ago
2
Getting information from Root Folder
#55
GoogleCodeExporter
closed
9 years ago
2
Make the crawler4j repeatably usable without restarting program (remove static)
#54
GoogleCodeExporter
closed
9 years ago
8
Errors with database logic and multiple threads.
#53
GoogleCodeExporter
closed
9 years ago
1
1 thread is working, the rest are just waiting at getNextURLs
#52
GoogleCodeExporter
closed
9 years ago
2
Multiple domains crawl without politeness interval
#51
GoogleCodeExporter
opened
9 years ago
6
crawler will not follow relative URLs in redirects
#50
GoogleCodeExporter
closed
9 years ago
7
Previous
Next