issues
search
lintool
/
warcbase
Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
renaming ExtractTopLevelDomain to ExtractDomain
#212
ianmilligan1
closed
8 years ago
0
Add UDF for computing MD5 checksum
#211
lintool
opened
8 years ago
11
Add UDF for extracting stuff from tweets
#210
lintool
closed
8 years ago
1
Dynamic PageRank Crashes
#209
ianmilligan1
closed
8 years ago
7
ExtractTopLevelDomain UDF misnamed
#208
lintool
closed
8 years ago
2
Merge GraphX to master
#207
jrwiebe
closed
8 years ago
5
Build issues on vagrant
#206
ruebot
closed
8 years ago
4
Setup TravisCI
#205
ruebot
closed
8 years ago
0
Add support for analyzing tweets
#204
lintool
closed
8 years ago
0
UDF for extracting image links
#203
lintool
closed
8 years ago
6
Selecting Pages that Contain Certain Keywords
#202
ianmilligan1
closed
8 years ago
4
Represent link structure as graph using GraphX
#201
jrwiebe
closed
8 years ago
4
Deprecate loadWarc and loadArc in favour of loadArchives
#200
jrwiebe
closed
8 years ago
1
Loading ARC files produces record size errror
#199
jrwiebe
opened
8 years ago
2
java.io.EOFException when working with WARCs
#198
ianmilligan1
closed
8 years ago
4
Wildcard support in KeepUrls?
#197
ianmilligan1
closed
8 years ago
4
Fine-Tuned Link Extraction within Domains
#196
ianmilligan1
closed
8 years ago
5
Detect WARC or ARC format when loading Records
#195
bitzl
closed
8 years ago
3
Add RemovePrefixWWW method
#194
aliceranzhou
closed
8 years ago
0
Add dateExtract and tabDelimit
#193
aliceranzhou
closed
8 years ago
5
Write removePrefixWWW method
#192
lintool
closed
8 years ago
2
URL for Warcbase Docs
#191
ianmilligan1
closed
8 years ago
5
Translate `DetectLanguage` pig script into Scala; Incorporate into RecordRDD?
#190
ianmilligan1
closed
8 years ago
4
API refactoring: Should WARecord really be an inner class of RecordTransformers?
#189
lintool
closed
8 years ago
1
Make WARecord serializable with KyroSerializer
#188
aliceranzhou
closed
8 years ago
6
Site Link Structure Output, Group by Month?
#187
ianmilligan1
closed
8 years ago
2
Propagate Spark serializers to within data loading API
#186
lintool
closed
8 years ago
7
loadWarc crashing on link extraction, while loadArc works
#185
ianmilligan1
closed
8 years ago
12
Redo Documentation to Account for getContentString, getContentBytes, etc.
#184
ianmilligan1
closed
8 years ago
5
Link structure visualization
#183
lintool
closed
8 years ago
2
Merge in link analysis scripts from branch
#182
lintool
closed
8 years ago
1
Publish Warcbase wiki as gitbook (decided on MKDocs instead)
#181
lintool
closed
8 years ago
13
Warcbase Resources Repository
#180
ianmilligan1
closed
8 years ago
2
Revision API to be more descriptive
#179
aliceranzhou
closed
8 years ago
0
Spark notebook: display images inline using base64 encoding and HTML injection
#178
lintool
opened
8 years ago
0
API revisions: getBodyContent, getRawBodyContent
#177
lintool
closed
8 years ago
13
Do we still need ExtractLinksAndText?
#176
lintool
closed
8 years ago
4
Pig2Gdf.py deprecated? Switch to native GDF exporter
#175
ianmilligan1
closed
8 years ago
4
Documenting all Functions in Wiki
#174
ianmilligan1
closed
8 years ago
9
Kill JwatArcLoaderTest
#173
lintool
closed
8 years ago
2
Do we need RecordUtils?
#172
lintool
closed
8 years ago
3
Get Warcbase running on Matt Weber's cluster at Rutgers
#171
lintool
closed
8 years ago
4
Spark job taking too long
#170
jrwiebe
closed
8 years ago
25
Write a Generic Scala Introduction
#169
ianmilligan1
closed
6 years ago
3
NER Workflow & Documentation
#168
ianmilligan1
closed
6 years ago
8
NER3Classifier object is not serializable (NER3Classifier.scala)
#167
jrwiebe
closed
8 years ago
9
loadWarc generating empty arrays
#166
ianmilligan1
closed
8 years ago
3
Try Google Cloud Dataproc
#165
lintool
opened
8 years ago
0
Try Google Cloud Bigtable
#164
lintool
opened
8 years ago
0
Make keepValidPages a bit smarter
#163
lintool
closed
8 years ago
1
Previous
Next