issues
search
xrma
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add an option to tweak the URL before processing the page
#239
GoogleCodeExporter
closed
9 years ago
2
fetchHeader Does a HTTP GET
#238
GoogleCodeExporter
closed
9 years ago
3
CrawlController.start() should take a Crawler object, not a class
#237
GoogleCodeExporter
closed
9 years ago
2
Please default includeHttpsPages to true
#236
GoogleCodeExporter
closed
9 years ago
3
Support for Java 6
#235
GoogleCodeExporter
closed
9 years ago
2
Skip writing files to disk
#234
GoogleCodeExporter
closed
9 years ago
4
Lack of documentation for FILTERS
#233
GoogleCodeExporter
closed
9 years ago
2
Issue on number of crawled pages
#232
GoogleCodeExporter
closed
9 years ago
2
Memory leakage in crawler4j caused by database environment
#231
GoogleCodeExporter
closed
9 years ago
2
Crawling page result of form POST submitting
#230
GoogleCodeExporter
opened
9 years ago
0
Support for generation of OSGi Bundle
#229
GoogleCodeExporter
opened
9 years ago
0
Illegal character in query
#228
GoogleCodeExporter
opened
9 years ago
1
The latest version crawler4j required jdk1.7
#227
GoogleCodeExporter
closed
9 years ago
1
Seed URL and Final URL differ (No Redirects)
#226
GoogleCodeExporter
opened
9 years ago
1
Meta refresh does not work correctly ?
#225
GoogleCodeExporter
closed
9 years ago
5
Parsing of urls with # broken
#224
GoogleCodeExporter
opened
9 years ago
1
WebURL couldn't parse domain if url is ip address or url include port number
#223
GoogleCodeExporter
opened
9 years ago
0
Not able to crawl through public domain websites by setting proxies.
#222
GoogleCodeExporter
opened
9 years ago
0
Pressing Button to Run crawler for a second time doesn't work
#221
GoogleCodeExporter
closed
9 years ago
1
Add a filtering class to handle more easily URL filtering
#220
GoogleCodeExporter
opened
9 years ago
1
Error during crawling - Not immediate (after several days of crawling) : Error while getting next urls [...] Java Error occurred, recovery may not be possible.
#219
GoogleCodeExporter
closed
9 years ago
2
Messy code for HTMLParse ! ! 中文乱码
#218
GoogleCodeExporter
opened
9 years ago
5
Not able to get javascript related files in web url list
#217
GoogleCodeExporter
opened
9 years ago
1
crawl JSON content instead of HTML
#216
GoogleCodeExporter
closed
9 years ago
12
Page already crawled gets crawled again.
#215
GoogleCodeExporter
closed
9 years ago
16
Always log if exception happens
#214
GoogleCodeExporter
closed
9 years ago
3
migrating log4j to slf4j
#213
GoogleCodeExporter
closed
9 years ago
2
Exception Database was closed
#212
GoogleCodeExporter
opened
9 years ago
3
Unable to parse the entire structure of a website (hitam.org) using BaseCrawler code
#211
GoogleCodeExporter
closed
9 years ago
4
crawler4j doesn't preserve the order of query parameters after process a link
#210
GoogleCodeExporter
closed
9 years ago
1
Can't crawl some websites?
#209
GoogleCodeExporter
closed
9 years ago
5
Multiple proxies
#208
GoogleCodeExporter
closed
9 years ago
1
Couldn't find tld-names.txt
#207
GoogleCodeExporter
closed
9 years ago
3
StringIndexOutOfBoundsException in WebURL
#206
GoogleCodeExporter
closed
9 years ago
4
Please eclipse generated files from the repository
#205
GoogleCodeExporter
closed
9 years ago
3
url with "@" won't be crawled
#204
GoogleCodeExporter
opened
9 years ago
3
Bound mismatch: The generic method start(Class<T>, int) of type CrawlController is not applicable for the arguments (Class<MyCrawler>, int). The inferred type MyCrawler is not a valid substitute for the bounded parameter <T extends WebCrawler>
#203
GoogleCodeExporter
closed
9 years ago
14
Patch for /src/test/java/edu/uci/ics/crawler4j/examples/basic/BasicCrawler.java
#202
GoogleCodeExporter
closed
9 years ago
1
edu.uci.ics.crawler4j.crawler.WebCrawler - make method processPage protected
#201
GoogleCodeExporter
closed
9 years ago
5
Cant download images from amazon.com only....other sites work fine
#200
GoogleCodeExporter
closed
9 years ago
4
Fatal transport error while fetching robots.txt
#199
GoogleCodeExporter
closed
9 years ago
10
Crawl Based on Date range
#198
GoogleCodeExporter
closed
9 years ago
1
Feature request --- Exchanging order of visit(Page) and scheduling links from page
#197
GoogleCodeExporter
closed
9 years ago
1
ipt
#196
GoogleCodeExporter
closed
9 years ago
2
Robots.txt parser is not working with Disallow: *
#195
GoogleCodeExporter
closed
9 years ago
2
Configuring Crawler4j
#194
GoogleCodeExporter
closed
9 years ago
1
Wrong html downloaded
#193
GoogleCodeExporter
closed
9 years ago
1
Block when skipping large file
#192
GoogleCodeExporter
closed
9 years ago
2
Incorrectly revisit pages when resuming because not deleting url in frontier correctly
#191
GoogleCodeExporter
closed
9 years ago
1
Add support for last-modified and etag to Page class
#190
GoogleCodeExporter
opened
9 years ago
1
Previous
Next