issues
search
sujit-kr
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
missing links in the url list during the crawling process
#82
GoogleCodeExporter
closed
9 years ago
0
Exception --> Can't open a cursor Database was closed
#81
GoogleCodeExporter
closed
9 years ago
4
Crawler crawls javascript and css files with ? at the end of the url
#80
GoogleCodeExporter
closed
9 years ago
3
WARNING: Could not find crawler4j.properties file in class path.
#79
GoogleCodeExporter
closed
9 years ago
7
Discard robots.txt files that are no plain text
#78
GoogleCodeExporter
closed
9 years ago
1
Reading diffrent Base URL to use as Seed, to crawler in a loop
#77
GoogleCodeExporter
closed
9 years ago
2
Passing arguments to webcrawler
#76
GoogleCodeExporter
closed
9 years ago
4
Exchangable robots.txt stores
#75
GoogleCodeExporter
opened
9 years ago
2
Accept-Language header
#74
GoogleCodeExporter
opened
9 years ago
3
Unnecessary fetching of robots.txt files
#73
GoogleCodeExporter
closed
9 years ago
2
What is the use of Berkley DB here?
#72
GoogleCodeExporter
closed
9 years ago
3
java.lang.NullPointerException
#71
GoogleCodeExporter
closed
9 years ago
5
[deleted issue]
#70
GoogleCodeExporter
closed
9 years ago
0
Requests Per Second Per Host
#69
GoogleCodeExporter
opened
9 years ago
7
Tow CrawlController instance
#68
GoogleCodeExporter
closed
9 years ago
2
Fatal error in JVM
#67
GoogleCodeExporter
closed
9 years ago
1
NoHttpResponseException
#66
GoogleCodeExporter
closed
9 years ago
1
EnvironmentFailureException
#65
GoogleCodeExporter
closed
9 years ago
4
can I delete the jdb files in folder frontier while the crawler is running?
#64
GoogleCodeExporter
closed
9 years ago
2
[deleted issue]
#63
GoogleCodeExporter
closed
9 years ago
0
Need to grab a link from onclick()js code and craw it. Made url out of js code, called page.setURLs() in visit(Page page) not working.
#62
GoogleCodeExporter
closed
9 years ago
3
MakeCrawlerJ distributed
#61
GoogleCodeExporter
opened
9 years ago
5
Where is manual? Please write some simply steps to do
#60
GoogleCodeExporter
closed
9 years ago
6
Crawler ignores robots meta-tag from the page
#59
GoogleCodeExporter
opened
9 years ago
4
Crawler ignores Crawl-delay from the host's robots.txt
#58
GoogleCodeExporter
opened
9 years ago
7
page.isBinary() returns false for .pdf and .doc files (probably any other too)
#57
GoogleCodeExporter
closed
9 years ago
4
Problem with restart crawler with difference seeds
#56
GoogleCodeExporter
closed
9 years ago
2
Getting information from Root Folder
#55
GoogleCodeExporter
closed
9 years ago
2
Make the crawler4j repeatably usable without restarting program (remove static)
#54
GoogleCodeExporter
closed
9 years ago
8
Errors with database logic and multiple threads.
#53
GoogleCodeExporter
closed
9 years ago
1
1 thread is working, the rest are just waiting at getNextURLs
#52
GoogleCodeExporter
closed
9 years ago
2
Multiple domains crawl without politeness interval
#51
GoogleCodeExporter
opened
9 years ago
6
crawler will not follow relative URLs in redirects
#50
GoogleCodeExporter
closed
9 years ago
7
Multithreading and protection against duplicates
#49
GoogleCodeExporter
closed
9 years ago
1
Make cookie policy configurable
#48
GoogleCodeExporter
opened
9 years ago
5
are there any plans to move to maven?
#47
GoogleCodeExporter
closed
9 years ago
10
All the seeds are crawled before the in depth crawl starts
#46
GoogleCodeExporter
closed
9 years ago
1
All the seeds are crawled before the in depth crawl starts
#45
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#44
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#43
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#42
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#41
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#40
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#39
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#38
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#37
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#36
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#35
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#34
GoogleCodeExporter
closed
9 years ago
1
All the seeds get crawled or visited before any further depth is crawled
#33
GoogleCodeExporter
closed
9 years ago
1
Next