issues
search
ukwa
/
ukwa-manage
Shepherding our web archives from crawl to access.
Apache License 2.0
10
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Work out how to download WACZ files from Browsertrix-Cloud
#114
anjackson
opened
10 months ago
0
Allow direct indexing of WACZ files
#113
anjackson
opened
10 months ago
1
Update MrJob to use venv module rather than virtualenv
#112
anjackson
opened
1 year ago
0
Bump werkzeug from 1.0.1 to 2.2.3
#111
dependabot[bot]
opened
1 year ago
0
Bump ipython from 7.31.1 to 8.10.0
#110
dependabot[bot]
opened
1 year ago
0
Bump cryptography from 3.4.6 to 39.0.1
#109
dependabot[bot]
opened
1 year ago
0
Bump certifi from 2020.12.5 to 2022.12.7
#108
dependabot[bot]
opened
1 year ago
0
Improve the index-delete command
#107
anjackson
opened
1 year ago
0
Bump pillow from 9.0.1 to 9.3.0
#106
dependabot[bot]
opened
1 year ago
0
CDX Indexing failing on weird data
#105
anjackson
opened
1 year ago
33
Document harvester getting stuck when connections hang
#104
anjackson
closed
1 year ago
4
Redirects to web archives should be indexed appropriately.
#103
anjackson
opened
1 year ago
0
Bump nbconvert from 6.0.7 to 6.5.1
#102
dependabot[bot]
opened
1 year ago
0
Bump nbconvert from 6.0.7 to 6.3.0
#101
dependabot[bot]
closed
1 year ago
1
Bump mistune from 0.8.4 to 2.0.3
#100
dependabot[bot]
opened
1 year ago
0
Bump lxml from 4.7.1 to 4.9.1
#99
dependabot[bot]
opened
1 year ago
0
Bump numpy from 1.21.0 to 1.22.0
#98
dependabot[bot]
opened
2 years ago
0
Write tools to list DC buckets and contents from AWS
#97
anjackson
closed
12 months ago
4
Make one-off index jobs use batching for large tasks
#96
anjackson
opened
2 years ago
0
MrJob submitter should allow STDIN, skip blank lines in filenames
#95
anjackson
opened
2 years ago
1
Bump paramiko from 2.7.2 to 2.10.1
#94
dependabot[bot]
opened
2 years ago
0
For MrJob, make unpacking archives optional
#93
anjackson
opened
2 years ago
9
Bump pillow from 9.0.0 to 9.0.1
#92
dependabot[bot]
closed
2 years ago
1
Bump ipython from 7.20.0 to 7.31.1
#91
dependabot[bot]
closed
2 years ago
0
Bump numpy from 1.20.3 to 1.21.0
#90
dependabot[bot]
closed
2 years ago
0
Bump pillow from 8.4.0 to 9.0.0
#89
dependabot[bot]
closed
2 years ago
0
Improve tidy_warcs command
#88
anjackson
opened
2 years ago
1
Review DDHAPT DB schema
#87
anjackson
opened
2 years ago
0
Use seed results from log analysis
#86
anjackson
closed
2 years ago
1
Add post-indexing verification step
#85
anjackson
opened
2 years ago
0
Improve Document Harvester tools
#84
anjackson
closed
2 years ago
3
Store log analysis output, or stop creating it
#83
anjackson
opened
2 years ago
0
Create Hadoop 020/3 Document Harvester Log Analysis task
#82
anjackson
closed
2 years ago
0
Create and test Hadoop 020/3 Solr indexer
#81
anjackson
opened
2 years ago
0
Create and test Hadoop 020/3 CDX indexer
#80
anjackson
closed
2 years ago
1
Extract Block Scanner Report metrics system
#79
anjackson
opened
2 years ago
0
Document Harvester issues
#78
anjackson
closed
2 years ago
3
Add older BBC News image server to access list
#77
anjackson
opened
2 years ago
0
Add document processing metrics
#76
anjackson
opened
2 years ago
0
Support multiple HDFS services
#75
anjackson
closed
2 years ago
1
Add fast-crawl sheet for legislation.gov URLs on launch
#74
anjackson
opened
3 years ago
1
w3act targets qa check
#73
ldbiz
closed
4 years ago
1
Check indexer handling of rendered items or metadata URLs
#72
anjackson
opened
4 years ago
0
Updated tasks to make better use of TrackDB
#71
anjackson
closed
4 years ago
0
Code for warcs into solr and trackdb
#70
GilHoggarth
closed
4 years ago
0
Bump bleach from 3.1.0 to 3.1.4 in /notebooks
#69
dependabot[bot]
closed
4 years ago
0
Bump bleach from 3.1.0 to 3.1.2 in /notebooks
#68
dependabot[bot]
closed
4 years ago
1
CDX Index checks failing for URL with many instances
#67
anjackson
opened
4 years ago
0
More resiliant launch protocol
#66
anjackson
opened
4 years ago
0
Bump bleach from 3.1.0 to 3.1.1 in /notebooks
#65
dependabot[bot]
closed
4 years ago
1
Next