issues
search
xrma
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
url contains '\'
#139
GoogleCodeExporter
closed
9 years ago
3
URLCanonicalizer parameters normalization is buggy
#138
GoogleCodeExporter
opened
9 years ago
1
Resources referenced in css files
#137
GoogleCodeExporter
closed
9 years ago
1
JVM crash when running crawler on Centos 6.2
#136
GoogleCodeExporter
closed
9 years ago
14
How can i download the javascript files?
#135
GoogleCodeExporter
closed
9 years ago
1
Crawler never stops and repeats URL
#134
GoogleCodeExporter
closed
9 years ago
2
How to get the content type and prevent crawling for example feeds?
#133
GoogleCodeExporter
closed
9 years ago
2
PageFetcher.discardContentIfNotConsumed throws a lot of errors every time.
#132
GoogleCodeExporter
closed
9 years ago
2
Internal error in WebURL
#131
GoogleCodeExporter
closed
9 years ago
8
all urls are lowercase when use WebCrawler.java
#130
GoogleCodeExporter
closed
9 years ago
3
need help
#129
GoogleCodeExporter
closed
9 years ago
2
crawler4j ignores robots.txt
#128
GoogleCodeExporter
closed
9 years ago
3
port of robots.txt
#127
GoogleCodeExporter
closed
9 years ago
5
crawling of sites within mailto:
#126
GoogleCodeExporter
closed
9 years ago
1
Suggested change - remove logger.setLevel in PageFetcher
#125
GoogleCodeExporter
closed
9 years ago
2
Suggesting to add toString to CrawlConfig ...
#124
GoogleCodeExporter
closed
9 years ago
1
crawler storage data size is increasing
#123
GoogleCodeExporter
closed
9 years ago
6
Crawl Never Starts Final Cleanup
#122
GoogleCodeExporter
closed
9 years ago
3
fetcher.PageFetcher: Failed: HTTP/1.1 400 Bad Request
#121
GoogleCodeExporter
closed
9 years ago
2
handlePageStatusCode triggers only once for the same Broken Link ?
#120
GoogleCodeExporter
closed
9 years ago
2
How to make this a focused crawler?
#119
GoogleCodeExporter
closed
9 years ago
3
Too many open files
#118
GoogleCodeExporter
closed
9 years ago
9
How to use .p12 file for https
#117
GoogleCodeExporter
closed
9 years ago
1
Cannot handle page with 207 status code
#116
GoogleCodeExporter
closed
9 years ago
3
Error during parsing when a link within a crawled page has a training %
#115
GoogleCodeExporter
closed
9 years ago
1
Error while analyzing links from a page with query string and no file extension
#114
GoogleCodeExporter
closed
9 years ago
1
Add parenturl in webURL
#113
GoogleCodeExporter
closed
9 years ago
8
visit method for each domain crawl
#112
GoogleCodeExporter
closed
9 years ago
3
[deleted issue]
#111
GoogleCodeExporter
closed
9 years ago
0
File URLs Fetching
#110
GoogleCodeExporter
opened
9 years ago
2
Configuration to set what type of links to crawl - SCRIPT,LINK,IMG etc.,
#109
GoogleCodeExporter
opened
9 years ago
3
Fatal Transport Error in New Version
#108
GoogleCodeExporter
closed
9 years ago
13
Recrawl Not Fetched Links
#107
GoogleCodeExporter
closed
9 years ago
4
Enhancement: Add page response/status code in the URL List - To check broken links & parent page
#106
GoogleCodeExporter
closed
9 years ago
12
Stat for Rel="nofollow" attribute in anchor (<a) tag.
#105
GoogleCodeExporter
closed
9 years ago
8
[deleted issue]
#104
GoogleCodeExporter
closed
9 years ago
0
inproper operation when there is more then one crawl thread (threadsafety problem)
#103
GoogleCodeExporter
closed
9 years ago
1
Crawl Controller Instantiation
#102
GoogleCodeExporter
closed
9 years ago
2
Crawler does not follow Url like http://example.com/../../some.html
#101
GoogleCodeExporter
closed
9 years ago
8
Shutdown crawler takes long time
#100
GoogleCodeExporter
closed
9 years ago
2
runtime exception
#99
GoogleCodeExporter
closed
9 years ago
1
URLs with HTTPS Not Fetched
#98
GoogleCodeExporter
closed
9 years ago
5
Not all domain URLs are crawled
#97
GoogleCodeExporter
closed
9 years ago
1
crawler for gbk html page
#96
GoogleCodeExporter
closed
9 years ago
1
Resuming Enabled Large Seed List Takes Forever
#95
GoogleCodeExporter
opened
9 years ago
3
shouldVisit list of domain to crawl
#94
GoogleCodeExporter
closed
9 years ago
4
[deleted issue]
#93
GoogleCodeExporter
closed
9 years ago
0
how to access crawled data
#92
GoogleCodeExporter
closed
9 years ago
2
Resumable Crawler ignores the maxPagesToFetch
#91
GoogleCodeExporter
closed
9 years ago
1
Fatal transport error
#90
GoogleCodeExporter
closed
9 years ago
2
Previous
Next