issues
search
lintool
/
warcbase
Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update documentation on Spark script for extracting site link structure
#162
lintool
closed
8 years ago
8
Ideas from Web Archives 2015
#161
mcburton
closed
8 years ago
6
Examining individual ARC/WARC records Spark
#160
lintool
closed
8 years ago
3
Rip out all the Pig stuff!
#159
lintool
closed
8 years ago
3
Port named entities extractor Pig script over to Spark
#158
lintool
closed
8 years ago
11
Refactor ExtractLinks UDF in matchbox
#157
lintool
closed
8 years ago
7
Add Spark RDD keepValidPages transformation
#156
lintool
closed
8 years ago
0
Simpler method for counting in Spark
#155
lintool
closed
8 years ago
2
Ideas for New Spark Scripts in Documentation
#154
ianmilligan1
closed
8 years ago
5
Upgrade Spark from 1.2
#153
lintool
closed
8 years ago
1
High-level cleanup/reorganization of documentation
#152
lintool
closed
8 years ago
1
Automatic topic issues classifier
#151
lintool
opened
8 years ago
1
Port Pig UDFs over to Spark
#150
lintool
closed
8 years ago
1
Port Pig test cases to Spark
#149
lintool
closed
8 years ago
0
Warcbase Jupyter integration
#148
lintool
closed
8 years ago
3
Separate Spark on Mac OS instructions into separate wiki page
#147
lintool
closed
8 years ago
1
Prototype fluent Spark API for manipulating archive data
#146
lintool
closed
8 years ago
16
Can't clone git repo according to your documentation.
#145
ghost
closed
8 years ago
4
In documentation, correct path for installation directory?
#144
brenreyes
closed
8 years ago
5
In documentation, change "hbase-env.html" to "hbase-env.xml"
#143
brenreyes
closed
9 years ago
1
In documentation, specify the version of Hadoop and Hbase you are using.
#142
brenreyes
closed
8 years ago
1
Alter Scripts to Allow Analysis by Date as well as Month
#141
ianmilligan1
closed
8 years ago
3
Re-upgrade Guava (UKWA's WARC Hadoop indexer dependency)
#140
lintool
opened
9 years ago
2
HadoopIndexer Producing Records w/o CrawlDates
#139
ianmilligan1
closed
6 years ago
4
Image Data Creeping into Plain Text
#138
ianmilligan1
opened
9 years ago
8
Integrate with warc-hadoop-indexer for Shine
#137
ianmilligan1
closed
9 years ago
3
Incorporate Crawl Size Visualization
#136
ianmilligan1
closed
9 years ago
3
broken pom.xml
#135
lintool
closed
9 years ago
0
Clean up duplicate documentation on text analysis on wiki
#134
lintool
closed
9 years ago
2
Find a semi-permanent home for Shine front-end to Canadian Political Parties collection
#133
lintool
closed
9 years ago
2
Tweaking documentation for "Building and Running Warcbase Under OS X"
#132
lintool
closed
9 years ago
1
Language ID
#131
ianmilligan1
closed
9 years ago
4
Incorporating pig2gdf
#130
ianmilligan1
closed
9 years ago
2
Better trapping of WARC load issues (Pig WarcLoader OOM issues)
#129
lintool
closed
9 years ago
2
Clean up org.warcbase.analysis
#128
lintool
closed
9 years ago
1
Remove dependence on JWAT
#127
lintool
closed
9 years ago
1
Change Pig WarcLoader to use WacWarcInputFormat
#126
lintool
closed
9 years ago
1
Better pyspark integration (use Python dict instead of string)
#125
lintool
closed
9 years ago
1
Pig UDF for Stanford NER tagger
#124
jrwiebe
closed
9 years ago
0
Merge in instructions for Warcbase on OS X
#123
lintool
closed
9 years ago
2
Pig UDF for access to Stanford NER
#122
lintool
closed
9 years ago
2
Updated README removing text about "only ARC, not WARC" limits per #64
#121
machawk1
closed
9 years ago
0
Look into PySpark integration with Warcbase
#120
lintool
closed
9 years ago
1
Write converter to GDF format
#119
lintool
closed
9 years ago
1
ExtractLinks UDF doesn't properly handle relative URLs
#118
lintool
closed
9 years ago
3
Write Pig UDF to extract top-level domain of URL
#117
lintool
closed
9 years ago
1
Loading HBase config on startup
#116
lintool
opened
9 years ago
0
Update README
#115
lintool
closed
9 years ago
1
Page load latency evaluation
#114
lintool
opened
9 years ago
0
Add documentation about pywb-warcbase
#113
lintool
opened
9 years ago
0
Previous
Next