Closed ibnesayeed closed 4 years ago
@github-actions run
#!/bin/bash
exec &> comment.buffer
wget https://github.com/machawk1/warcreate/files/4892191/20200704064631467.warc.zip
unzip 20200704064631467.warc.zip
#!/usr/bin/env python3
import sys, traceback
sys.stdout = open('comment.buffer', 'a')
try:
from warcio.archiveiterator import ArchiveIterator
with open('20200704064631467.warc', 'rb') as stream:
for record in ArchiveIterator(stream):
if record.rec_type == 'response':
print(record.rec_headers.get_header('WARC-Target-URI'))
except Exception:
traceback.print_exc(file=sys.stdout)
--2020-07-08 21:23:43-- https://github.com/machawk1/warcreate/files/4892191/20200704064631467.warc.zip Resolving github.com (github.com)... 140.82.114.4 Connecting to github.com (github.com)|140.82.114.4|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://github-production-repository-file-5c1aeb.s3.amazonaws.com/8906459/4892191?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200708%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200708T212343Z&X-Amz-Expires=300&X-Amz-Signature=00eb59af4480ebb7933a7775820e958930903d8d43f44bd801bc07ee77521dae&X-Amz-SignedHeaders=host&actor_id=0&repo_id=8906459&response-content-disposition=attachment%3Bfilename%3D20200704064631467.warc.zip&response-content-type=application%2Fzip [following] --2020-07-08 21:23:43-- https://github-production-repository-file-5c1aeb.s3.amazonaws.com/8906459/4892191?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200708%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200708T212343Z&X-Amz-Expires=300&X-Amz-Signature=00eb59af4480ebb7933a7775820e958930903d8d43f44bd801bc07ee77521dae&X-Amz-SignedHeaders=host&actor_id=0&repo_id=8906459&response-content-disposition=attachment%3Bfilename%3D20200704064631467.warc.zip&response-content-type=application%2Fzip Resolving github-production-repository-file-5c1aeb.s3.amazonaws.com (github-production-repository-file-5c1aeb.s3.amazonaws.com)... 52.216.136.75 Connecting to github-production-repository-file-5c1aeb.s3.amazonaws.com (github-production-repository-file-5c1aeb.s3.amazonaws.com)|52.216.136.75|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1326456 (1.3M) [application/zip] Saving to: ‘20200704064631467.warc.zip’
0K .......... .......... .......... .......... .......... 3% 14.6M 0s
50K .......... .......... .......... .......... .......... 7% 16.7M 0s
100K .......... .......... .......... .......... .......... 11% 28.4M 0s 150K .......... .......... .......... .......... .......... 15% 20.3M 0s 200K .......... .......... .......... .......... .......... 19% 20.6M 0s 250K .......... .......... .......... .......... .......... 23% 28.1M 0s 300K .......... .......... .......... .......... .......... 27% 34.2M 0s 350K .......... .......... .......... .......... .......... 30% 29.6M 0s 400K .......... .......... .......... .......... .......... 34% 35.7M 0s 450K .......... .......... .......... .......... .......... 38% 33.2M 0s 500K .......... .......... .......... .......... .......... 42% 35.0M 0s 550K .......... .......... .......... .......... .......... 46% 37.8M 0s 600K .......... .......... .......... .......... .......... 50% 27.2M 0s 650K .......... .......... .......... .......... .......... 54% 34.7M 0s 700K .......... .......... .......... .......... .......... 57% 172M 0s 750K .......... .......... .......... .......... .......... 61% 31.2M 0s 800K .......... .......... .......... .......... .......... 65% 27.1M 0s 850K .......... .......... .......... .......... .......... 69% 178M 0s 900K .......... .......... .......... .......... .......... 73% 32.7M 0s 950K .......... .......... .......... .......... .......... 77% 33.1M 0s 1000K .......... .......... .......... .......... .......... 81% 187M 0s 1050K .......... .......... .......... .......... .......... 84% 35.7M 0s 1100K .......... .......... .......... .......... .......... 88% 135M 0s 1150K .......... .......... .......... .......... .......... 92% 32.6M 0s 1200K .......... .......... .......... .......... .......... 96% 179M 0s 1250K .......... .......... .......... .......... ..... 100% 34.9M=0.04s
2020-07-08 21:23:44 (33.1 MB/s) - ‘20200704064631467.warc.zip’ saved [1326456/1326456]
Archive: 20200704064631467.warc.zip
inflating: 20200704064631467.warc
https://twitter.com/prosodyContext
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/recordloader.py", line 224, in _detect_type_load_headers
rec_headers = self.warc_parser.parse(stream, statusline)
File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/statusandheaders.py", line 270, in parse
raise StatusAndHeadersParserException(msg, full_statusline)
warcio.statusandheaders.StatusAndHeadersParserException: Expected Status Line starting with ['WARC/1.1', 'WARC/1.0', 'WARC/0.17', 'WARC/0.18'] - Found: WARC-Type: request
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/work/warcreate/warcreate/tmp_AozGuAylsrrQKFj4VPDhfkz6ukle64Lh", line 10, in
Add a GH Workflow to enable running arbitrary commands from issue comments. This can only be tested once merged first.