issues
search
xrma
/
crawler4j
Automatically exported from code.google.com/p/crawler4j
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Project status?
#189
GoogleCodeExporter
closed
9 years ago
2
Patch for /src/main/java/edu/uci/ics/crawler4j/parser/Parser.java
#188
GoogleCodeExporter
opened
9 years ago
1
Image Downloading
#187
GoogleCodeExporter
closed
9 years ago
1
HTML parser does not delimit words by html element
#186
GoogleCodeExporter
opened
9 years ago
1
How to crawl .js files?
#185
GoogleCodeExporter
opened
9 years ago
1
If uncatched exception is thrown in overloaded visit method of webcrawler in log stacktrace is not shown
#184
GoogleCodeExporter
closed
9 years ago
3
Pause all threads until one operation is finished
#183
GoogleCodeExporter
closed
9 years ago
2
While crawling web site, it suddenly stops by giving exception
#182
GoogleCodeExporter
closed
9 years ago
2
Not possible to run a crawl more than once.
#181
GoogleCodeExporter
opened
9 years ago
3
WARN Could not remove: [page...] from list of processed pages.
#180
GoogleCodeExporter
closed
9 years ago
5
some relative path is ignored
#179
GoogleCodeExporter
closed
9 years ago
2
WebURL overrides equals method without overriding hashCode as well
#178
GoogleCodeExporter
closed
9 years ago
1
Threads without synchronism
#177
GoogleCodeExporter
opened
9 years ago
1
crawler4j suddenly freezes (i.e. no more info, no more crawling)
#176
GoogleCodeExporter
closed
9 years ago
7
Missing IF-Statement causes crawler to throw a NullPointerException while syncing
#175
GoogleCodeExporter
closed
9 years ago
11
Is crawler4j support crawling https page
#174
GoogleCodeExporter
closed
9 years ago
6
link with space
#173
GoogleCodeExporter
closed
9 years ago
2
Actually i have problem with arabic encoding .....
#172
GoogleCodeExporter
opened
9 years ago
2
Crawl a site that requires login
#171
GoogleCodeExporter
closed
9 years ago
2
WebCrawler.shouldVisit doesn't gets relative URLs
#170
GoogleCodeExporter
closed
9 years ago
3
Examples crash
#169
GoogleCodeExporter
closed
9 years ago
6
Not obeying robots.txt
#168
GoogleCodeExporter
closed
9 years ago
2
Unable to crawl https urls using crawl4j
#167
GoogleCodeExporter
closed
9 years ago
2
ERROR [Crawler 3] Proxy authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (系统找不到指定的文件。))
#166
GoogleCodeExporter
opened
9 years ago
0
Enhancement: Add HTML Header to HtmlParseData
#165
GoogleCodeExporter
opened
9 years ago
1
setURL can crash and burn in the case of malformed URLs or weird protocols
#164
GoogleCodeExporter
opened
9 years ago
1
Parent Docid and Parent URL is showing null
#163
GoogleCodeExporter
closed
9 years ago
3
How to get http response
#162
GoogleCodeExporter
closed
9 years ago
6
how to crawl a file in unix and linux Envirment
#161
GoogleCodeExporter
opened
9 years ago
1
New feature: Add more context in shouldVisit
#160
GoogleCodeExporter
closed
9 years ago
2
Crawler4j does not handle move to URLs which are relative
#159
GoogleCodeExporter
closed
9 years ago
3
Memory storage instead of disk storage?
#158
GoogleCodeExporter
opened
9 years ago
1
Cannot delete frontier temp folder
#157
GoogleCodeExporter
closed
9 years ago
10
Different keys used for PUT and REMOVE operations on DB
#156
GoogleCodeExporter
closed
9 years ago
3
Where is Crawled Data being stored after crawling ends
#155
GoogleCodeExporter
closed
9 years ago
3
sleepycat "75 min" IllegalArgumentException
#154
GoogleCodeExporter
closed
9 years ago
3
How to crawl web pages like *.do?
#153
GoogleCodeExporter
closed
9 years ago
4
The crawler stops running further if the start url returns a 302 redirect.
#152
GoogleCodeExporter
closed
9 years ago
5
Making a focused crawler based on the page content?
#151
GoogleCodeExporter
closed
9 years ago
2
Unexpected behavior of URLCanonicalizer.getCanonicalURL(href, context)
#150
GoogleCodeExporter
opened
9 years ago
4
Proper compression support in the PageFetcher
#149
GoogleCodeExporter
closed
9 years ago
2
Incompatible argument to function Exception
#148
GoogleCodeExporter
closed
9 years ago
3
Class not found exception
#147
GoogleCodeExporter
closed
9 years ago
3
Html content comes incomplete
#146
GoogleCodeExporter
opened
9 years ago
3
charsetName NullPointer exception
#145
GoogleCodeExporter
closed
9 years ago
4
Add a possiblity to use Factory for instantiating new WebCrawlers, instead of hardcoded usage of class.newInstance()
#144
GoogleCodeExporter
opened
9 years ago
4
Impossible to get anchor text in visit(Page page)
#143
GoogleCodeExporter
closed
9 years ago
17
The Crawler thread appends with Crawler.setMaxPages(int)
#142
GoogleCodeExporter
closed
9 years ago
3
Give developers the option of getting the urls on a page themselves
#141
GoogleCodeExporter
closed
9 years ago
2
Different Domains for different threads
#140
GoogleCodeExporter
opened
9 years ago
1
Previous
Next