issues
search
tasfe
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
putting Selenium Code into CrawlController --> Exception in thread "main" java.lang.NoSuchFieldError: INSTANCE
#265
GoogleCodeExporter
closed
9 years ago
3
Unable to shutdown crawler after server errors.
#264
GoogleCodeExporter
closed
9 years ago
1
Now Seeding Wordpress Hosted Websites
#263
GoogleCodeExporter
closed
9 years ago
5
Not Visiting Certain Seed Urls
#262
GoogleCodeExporter
closed
9 years ago
3
Crawler4j missing more control over retry count
#261
GoogleCodeExporter
opened
9 years ago
1
Environment daemon threads keep running after CrawlController.shutdown()
#260
GoogleCodeExporter
closed
9 years ago
10
HtmlParseData.getText() doesn't recognize breaks or paragraphs
#259
GoogleCodeExporter
opened
9 years ago
1
crawler4j as servlet
#258
GoogleCodeExporter
closed
9 years ago
1
crawler4j dont crawl some sites
#257
GoogleCodeExporter
closed
9 years ago
4
Remove Hard-Coded Sleeps
#256
GoogleCodeExporter
opened
9 years ago
1
Many URLs are discarded / not processed(missing in output)
#255
GoogleCodeExporter
closed
9 years ago
2
Quartz scheduler + crawler4J http connection error
#254
GoogleCodeExporter
opened
9 years ago
0
Fatal Transport Error: Read timeout while fetching from same host multiple times
#253
GoogleCodeExporter
closed
9 years ago
6
Patch for /src/main/java/edu/uci/ics/crawler4j/parser/HtmlContentHandler.java
#252
GoogleCodeExporter
opened
9 years ago
1
Fix a typo
#251
GoogleCodeExporter
closed
9 years ago
2
How to do NTLM Authentication ?
#250
GoogleCodeExporter
opened
9 years ago
5
UnsupportedClassVersionError / Unsupported
#249
GoogleCodeExporter
closed
9 years ago
6
Crawling for specific Number (EANs Eurpoean Article Numbers)
#248
GoogleCodeExporter
opened
9 years ago
2
Errors during crawling (maybe regarding robots.txt)
#247
GoogleCodeExporter
closed
9 years ago
21
Storing Videos a problem
#246
GoogleCodeExporter
closed
9 years ago
1
Provide easy access to (absolute) canonical URL
#245
GoogleCodeExporter
opened
9 years ago
1
Automatically increase politeness delay if received 420 or 429 HTTP code
#244
GoogleCodeExporter
opened
9 years ago
1
Deleting crawl storage folder after crawling?
#243
GoogleCodeExporter
closed
9 years ago
1
Crawler4j does not crawl to a broken link
#242
GoogleCodeExporter
closed
9 years ago
2
package error came wen trying to compile basic crawler
#241
GoogleCodeExporter
closed
9 years ago
2
This project is a fraud and should be classified as Malware
#240
GoogleCodeExporter
closed
9 years ago
2
Add an option to tweak the URL before processing the page
#239
GoogleCodeExporter
closed
9 years ago
2
fetchHeader Does a HTTP GET
#238
GoogleCodeExporter
closed
9 years ago
3
CrawlController.start() should take a Crawler object, not a class
#237
GoogleCodeExporter
closed
9 years ago
2
Please default includeHttpsPages to true
#236
GoogleCodeExporter
closed
9 years ago
3
Support for Java 6
#235
GoogleCodeExporter
closed
9 years ago
2
Skip writing files to disk
#234
GoogleCodeExporter
closed
9 years ago
4
Lack of documentation for FILTERS
#233
GoogleCodeExporter
closed
9 years ago
2
Issue on number of crawled pages
#232
GoogleCodeExporter
closed
9 years ago
2
Memory leakage in crawler4j caused by database environment
#231
GoogleCodeExporter
closed
9 years ago
2
Crawling page result of form POST submitting
#230
GoogleCodeExporter
opened
9 years ago
0
Support for generation of OSGi Bundle
#229
GoogleCodeExporter
opened
9 years ago
0
Illegal character in query
#228
GoogleCodeExporter
opened
9 years ago
1
The latest version crawler4j required jdk1.7
#227
GoogleCodeExporter
closed
9 years ago
1
Seed URL and Final URL differ (No Redirects)
#226
GoogleCodeExporter
opened
9 years ago
1
Meta refresh does not work correctly ?
#225
GoogleCodeExporter
closed
9 years ago
5
Parsing of urls with # broken
#224
GoogleCodeExporter
opened
9 years ago
1
WebURL couldn't parse domain if url is ip address or url include port number
#223
GoogleCodeExporter
opened
9 years ago
0
Not able to crawl through public domain websites by setting proxies.
#222
GoogleCodeExporter
opened
9 years ago
0
Pressing Button to Run crawler for a second time doesn't work
#221
GoogleCodeExporter
closed
9 years ago
1
Add a filtering class to handle more easily URL filtering
#220
GoogleCodeExporter
opened
9 years ago
1
Error during crawling - Not immediate (after several days of crawling) : Error while getting next urls [...] Java Error occurred, recovery may not be possible.
#219
GoogleCodeExporter
closed
9 years ago
2
Messy code for HTMLParse ! ! 中文乱码
#218
GoogleCodeExporter
opened
9 years ago
5
Not able to get javascript related files in web url list
#217
GoogleCodeExporter
opened
9 years ago
1
crawl JSON content instead of HTML
#216
GoogleCodeExporter
closed
9 years ago
12
Previous
Next