issues
search
unt-libraries
/
warc-metadata-sidecar
BSD 3-Clause "New" or "Revised" License
1
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add puid to merged cdxj
#25
gracieflores
closed
1 year ago
1
Add pres-id to merged cdxj
#24
vphill
closed
1 year ago
0
Retain warc with no sidecar records; update warcinfo
#23
ldko
closed
1 year ago
1
Handle sidecar files for WARCs that produce no sidecar records differently
#22
ldko
closed
1 year ago
0
Limit number of bytes that pymagic uses for format identification
#21
vphill
opened
1 year ago
2
Caching results and adding a return
#20
gracieflores
closed
1 year ago
1
Use chardet's universaldetector object to speed up process
#19
gracieflores
closed
1 year ago
3
Fix minor issues
#18
gracieflores
closed
1 year ago
1
Look at caching results within a WARC based on content hash.
#17
vphill
closed
1 year ago
0
clearer variable name for warc_file_path?
#16
vphill
closed
1 year ago
0
Change print(url) to logging.info(url)
#15
vphill
closed
1 year ago
0
More info from metadata_sidecar function?
#14
vphill
closed
1 year ago
0
Switch order of file and directory input
#13
vphill
closed
1 year ago
0
manage version info without requiring installation of script
#12
vphill
closed
1 year ago
1
Look at setting and reusing a Soft404 classifier
#11
vphill
opened
1 year ago
1
Look at incrementally detecting the character set.
#10
vphill
closed
1 year ago
0
Merge cdxjs
#9
gracieflores
closed
2 years ago
3
Convert python structures to json
#8
gracieflores
closed
2 years ago
1
Warc sidecar cdxj
#7
gracieflores
closed
2 years ago
3
Soft 404 detection
#6
gracieflores
closed
2 years ago
2
Create CDXJ index for warc-metadata-sidecar WARCs and merge with existing CDXJ files
#5
ldko
closed
2 years ago
7
Add soft 404 detection
#4
gracieflores
closed
2 years ago
2
ARC metadata sidecar
#3
gracieflores
closed
2 years ago
2
metadata-sidecar should work for ARC files also
#2
gracieflores
closed
2 years ago
8
metadata sidecar script creation
#1
gracieflores
closed
2 years ago
8