issues
search
q-m
/
scrapy-webarchive
A plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.
http://developers.thequestionmark.org/scrapy-webarchive/
2
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix performance issue for large WACZ/WARC files on a remote location
#22
leewesleyv
opened
1 week ago
1
Improve performance of `get_warc_from_cdxj_record`
#21
leewesleyv
opened
1 week ago
1
Write redirect request/response to WARC (#19)
#20
leewesleyv
closed
1 week ago
0
Redirect URLs not resolved correctly
#19
leewesleyv
closed
1 week ago
3
Specify the complete destination with `SW_EXPORT_URI`
#18
leewesleyv
closed
3 weeks ago
6
Improve usage docs (#10)
#17
leewesleyv
closed
1 week ago
0
Behaviour of combining the middlewares and the extension
#16
leewesleyv
closed
3 weeks ago
0
Write datapackage.json to WACZ upon creation
#15
leewesleyv
closed
4 weeks ago
0
Add support for Python 3.13
#14
leewesleyv
opened
1 month ago
1
Add support for Scrapy>=2.9, Python3.7+
#13
leewesleyv
closed
1 month ago
0
Complete datapackage.json in WACZ
#12
wvengen
closed
4 weeks ago
0
Open WACZ files using the Scrapy stores
#11
leewesleyv
opened
1 month ago
2
Clearly document the two ways of crawling
#10
wvengen
closed
1 week ago
0
Support fetching live resources in downloader middleware
#9
leewesleyv
opened
1 month ago
1
Add additional variable for the archive output URI
#8
leewesleyv
closed
1 month ago
0
#3 Add release workflow
#7
leewesleyv
closed
1 month ago
1
Add spider name to destination URL
#6
wvengen
closed
1 month ago
1
Initial code-review
#5
wvengen
closed
3 weeks ago
4
Compatibility with Scrapy
#4
leewesleyv
closed
1 month ago
4
Workflow for releasing to PyPI
#3
leewesleyv
closed
1 month ago
4
Python compatibility
#2
leewesleyv
closed
1 month ago
0
Standardize use of settings for cloud providers across implementations
#1
leewesleyv
closed
1 month ago
1