issues
search
ukwa
/
webarchive-discovery
WARC and ARC indexing and discovery tools.
https://github.com/ukwa/webarchive-discovery/wiki
117
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Decode `Transfer-Encoding: chunked` responses
#218
anjackson
closed
4 years ago
7
Add parameter for author to mono or multi valued field
#217
clawia
closed
4 years ago
1
Bump tika.version from 1.19.1 to 1.23 in /digipres-tika
#216
dependabot[bot]
closed
4 years ago
0
Improve CDX indexing, especially of revisit records
#215
anjackson
closed
4 years ago
6
Server-side Last-Modified time should be indexed
#214
tokee
opened
5 years ago
1
The url_type:slashpage should not ignore query parameters
#213
anjackson
opened
5 years ago
0
Mark truncated files
#212
tokee
opened
5 years ago
4
Padding of wayback_date
#211
tokee
closed
5 years ago
1
Brotli compression and some refactoring
#210
tokee
closed
5 years ago
1
Adds exif_extraction = true to avoid errors on old configs
#209
tokee
closed
5 years ago
0
Webrecorder support
#208
tokee
closed
5 years ago
5
Consider possible improvements to general text handling
#207
anjackson
opened
5 years ago
1
Not all images extracted from an HTML page
#206
thomasegense
opened
5 years ago
0
Field to count number of domain links
#205
thomasegense
opened
5 years ago
2
GZipped HTML in warcs is not handled as web pages
#204
tokee
closed
5 years ago
1
Batch size should have a byte limit
#203
tokee
closed
3 years ago
0
Hadoop-indexer guava dependency issues
#202
Fernando-Melo
opened
5 years ago
2
Add support for supplying collection IDs via the annotations file
#201
anjackson
opened
5 years ago
0
Add IDs for collections etc
#200
anjackson
closed
5 years ago
4
Word documents and content_type_norm
#199
tokee
opened
6 years ago
1
Droid WARC URL header sanitize
#198
tokee
closed
5 years ago
0
Normalise WARC headers upon WARC Entry parse
#197
tokee
opened
6 years ago
0
WindowsOS path fix. This closes #194
#196
thomasegense
closed
6 years ago
1
Add further normalised content types for feeds, stylesheets, javascript
#195
anjackson
opened
6 years ago
0
Fix source_file_path for windows OS
#194
thomasegense
closed
6 years ago
0
Finally update to Java 8
#193
anjackson
closed
6 years ago
2
Prevent duplicate values for multi-valued fields
#192
anjackson
closed
6 years ago
1
The author should have been multivalued.
#191
anjackson
closed
6 years ago
0
Ensure UTF-8 locale is in use, or at least warn if not.
#190
anjackson
opened
6 years ago
1
Use service loaders and refactor code for clarity
#189
anjackson
closed
6 years ago
0
Pad wayback_date with zeroes
#188
tokee
closed
5 years ago
0
Remove standard ports from URLs
#187
tokee
opened
6 years ago
8
wayback_date field, set indexed="true"
#186
thomasegense
opened
6 years ago
6
Protocol as field (http/https etc.)
#185
thomasegense
opened
6 years ago
6
Url path parameters indexed in new multivalued field.
#184
thomasegense
opened
6 years ago
2
Separate Standard and KitchenSink builds
#183
anjackson
closed
6 years ago
5
Maven license plugin a little too... aggressive.
#182
anjackson
closed
6 years ago
1
Consider switching to urlcanon
#181
anjackson
opened
6 years ago
0
Support direct building of EmbeddedSolrServer cores
#180
anjackson
opened
6 years ago
2
Improve and (somewhat) standardise the annotations system?
#179
anjackson
opened
6 years ago
1
url_search with CamelCasing does not work as intended
#178
tokee
opened
6 years ago
0
Weekday & time of day
#177
tokee
opened
6 years ago
2
URL normalisation/canonicalisation fixes
#176
tokee
closed
6 years ago
5
Add WARC-Refers-To to revisits
#175
tokee
opened
6 years ago
2
Heatmaps with SpatialRecursivePrefixTreeFieldType
#174
tokee
opened
6 years ago
0
Clean up indentation in the code?
#173
tokee
closed
6 years ago
4
Close RandomAccessFile connection to the cacheFile upon cleanup
#172
tokee
closed
6 years ago
1
Add source WARC & URL to most error logs for TikaExtractor
#171
tokee
closed
6 years ago
2
Resource Name facets don't work; we need indexed=true.
#170
ruebot
closed
6 years ago
7
url normalization problem with \ (%5C)
#169
thomasegense
closed
6 years ago
0
Previous
Next