issues
search
ukwa
/
webarchive-discovery
WARC and ARC indexing and discovery tools.
https://github.com/ukwa/webarchive-discovery/wiki
115
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix build for Maven 3.8.1
#268
tokee
closed
2 years ago
0
Fix failing build with Maven 3.8.1
#267
tokee
closed
2 years ago
0
Add generic hashtag support
#266
tokee
opened
3 years ago
0
Flush buffered documents when writing documents to file
#265
tokee
closed
2 years ago
0
Remove extra .jar from the produced with-dependencies JAR
#264
tokee
closed
2 years ago
0
Enable text output when using Elasticsearch, mirroring Solr behaviour
#263
tokee
closed
2 years ago
1
Bump elasticsearch from 7.13.3 to 7.14.0 in /warc-indexer
#262
dependabot[bot]
closed
2 years ago
1
Improve MAVEN build Performance
#261
ChenZhangg
closed
2 years ago
1
Bump jsoup from 1.13.1 to 1.14.2 in /warc-indexer
#260
dependabot[bot]
closed
2 years ago
0
Bump elasticsearch from 7.12.0 to 7.13.3 in /warc-indexer
#259
dependabot[bot]
closed
3 years ago
3
Document consumer
#258
tokee
closed
3 years ago
0
Field rewrite
#257
tokee
closed
2 years ago
2
Add mechanism for custom adjustment of field content
#256
tokee
closed
2 years ago
0
Add support for indexing srcset links as image links
#255
tokee
closed
3 years ago
1
Upgrade Nanite to 1.4.1
#254
anjackson
closed
3 years ago
0
Fail early on severe Solr problems
#253
tokee
opened
3 years ago
0
Clean up temporary files underway
#252
tokee
opened
3 years ago
5
hash checking should be lenient
#251
tokee
opened
3 years ago
0
Make a proper README
#250
tokee
opened
3 years ago
1
posting solr records to elasticsearch
#249
aponb
closed
3 years ago
3
Bump commons-io from 2.6 to 2.7 in /warc-indexer
#248
dependabot[bot]
closed
3 years ago
0
Update Quick Start documentation, inc. ElasticSearch support.
#247
aponb
opened
3 years ago
2
srcset should add to links_images
#246
tokee
closed
3 years ago
0
Index WARC-Records of type resource (without http header)
#245
steph-nb
opened
3 years ago
0
Add POST data records, for PyWB playback
#244
anjackson
opened
3 years ago
5
Force older slf4j only where necessary
#243
anjackson
opened
3 years ago
0
Fix code for getting geo coordinates from images
#242
tokee
closed
3 years ago
1
Don't bother CDX-indexing HTTP 429 responses
#241
anjackson
opened
3 years ago
0
Use slf4j globally to fix log problems. Log4j2 in warc-indexer, log4j in hadoop modules
#240
blekinge
closed
3 years ago
2
url_type:splashpage is problematic
#239
tokee
closed
3 years ago
2
OpenJDK 11 compatibility problems
#238
anjackson
closed
3 years ago
1
Warc-indexer. Improved detecting of HTML-files with slightly invalid html-syntax
#237
thomasegense
opened
3 years ago
0
Warc-indexer. Better console-log for corrupt warc.gz records
#236
thomasegense
opened
3 years ago
0
Revisits are marked as failing hash validation
#235
tokee
closed
3 years ago
0
Digest stage should be re-examined
#234
tokee
closed
3 years ago
1
sha1 digest in hex fails hash validation
#233
tokee
closed
3 years ago
0
Decompress and dechunk fixes
#232
tokee
closed
3 years ago
1
Bump junit from 4.10 to 4.13.1 in /warc-indexer
#231
dependabot[bot]
closed
3 years ago
0
Bump junit from 4.8 to 4.13.1 in /digipres-tika
#230
dependabot[bot]
closed
3 years ago
0
Bump junit from 4.10 to 4.13.1 in /warc-hadoop-recordreaders
#229
dependabot[bot]
closed
3 years ago
0
License incompatibility
#228
ato
opened
4 years ago
0
Bump log4j-core from 2.13.1 to 2.13.2 in /warc-hadoop-recordreaders
#227
dependabot[bot]
closed
4 years ago
0
Bump log4j-core from 2.13.1 to 2.13.2 in /digipres-tika
#226
dependabot[bot]
closed
4 years ago
0
Bump xercesImpl from 2.11.0 to 2.12.0 in /warc-indexer
#225
dependabot[bot]
closed
4 years ago
0
Add support for Twitter-JSON-WARCs from Social Feed Manager
#224
tokee
opened
4 years ago
0
Switch to picocli instead of Typesafe Config
#223
anjackson
closed
2 years ago
1
Chunked content results in invalid digest
#222
tokee
closed
4 years ago
1
When specifying XML output, commit should be disabled
#221
tokee
closed
3 years ago
2
Add heuristic check for chunked content
#220
tokee
closed
4 years ago
1
Use picocli instead of Typesafe Config
#219
anjackson
closed
2 years ago
2
Previous
Next