machawk1 / warcreate

Chrome extension to "Create WARC files from any webpage"
https://warcreate.com
MIT License
205 stars 13 forks source link

Allow command run from issue comments #123

Closed ibnesayeed closed 4 years ago

ibnesayeed commented 4 years ago

Add a GH Workflow to enable running arbitrary commands from issue comments. This can only be tested once merged first.

ibnesayeed commented 4 years ago

@github-actions run

#!/bin/bash
exec &> comment.buffer

wget https://github.com/machawk1/warcreate/files/4892191/20200704064631467.warc.zip
unzip 20200704064631467.warc.zip
#!/usr/bin/env python3

import sys, traceback

sys.stdout = open('comment.buffer', 'a')

try:
    from warcio.archiveiterator import ArchiveIterator
    with open('20200704064631467.warc', 'rb') as stream:
        for record in ArchiveIterator(stream):
            if record.rec_type == 'response':
                print(record.rec_headers.get_header('WARC-Target-URI'))
except Exception:
    traceback.print_exc(file=sys.stdout)
github-actions[bot] commented 4 years ago

--2020-07-08 21:23:43-- https://github.com/machawk1/warcreate/files/4892191/20200704064631467.warc.zip Resolving github.com (github.com)... 140.82.114.4 Connecting to github.com (github.com)|140.82.114.4|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://github-production-repository-file-5c1aeb.s3.amazonaws.com/8906459/4892191?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200708%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200708T212343Z&X-Amz-Expires=300&X-Amz-Signature=00eb59af4480ebb7933a7775820e958930903d8d43f44bd801bc07ee77521dae&X-Amz-SignedHeaders=host&actor_id=0&repo_id=8906459&response-content-disposition=attachment%3Bfilename%3D20200704064631467.warc.zip&response-content-type=application%2Fzip [following] --2020-07-08 21:23:43-- https://github-production-repository-file-5c1aeb.s3.amazonaws.com/8906459/4892191?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200708%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200708T212343Z&X-Amz-Expires=300&X-Amz-Signature=00eb59af4480ebb7933a7775820e958930903d8d43f44bd801bc07ee77521dae&X-Amz-SignedHeaders=host&actor_id=0&repo_id=8906459&response-content-disposition=attachment%3Bfilename%3D20200704064631467.warc.zip&response-content-type=application%2Fzip Resolving github-production-repository-file-5c1aeb.s3.amazonaws.com (github-production-repository-file-5c1aeb.s3.amazonaws.com)... 52.216.136.75 Connecting to github-production-repository-file-5c1aeb.s3.amazonaws.com (github-production-repository-file-5c1aeb.s3.amazonaws.com)|52.216.136.75|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1326456 (1.3M) [application/zip] Saving to: ‘20200704064631467.warc.zip’

 0K .......... .......... .......... .......... ..........  3% 14.6M 0s
50K .......... .......... .......... .......... ..........  7% 16.7M 0s

100K .......... .......... .......... .......... .......... 11% 28.4M 0s 150K .......... .......... .......... .......... .......... 15% 20.3M 0s 200K .......... .......... .......... .......... .......... 19% 20.6M 0s 250K .......... .......... .......... .......... .......... 23% 28.1M 0s 300K .......... .......... .......... .......... .......... 27% 34.2M 0s 350K .......... .......... .......... .......... .......... 30% 29.6M 0s 400K .......... .......... .......... .......... .......... 34% 35.7M 0s 450K .......... .......... .......... .......... .......... 38% 33.2M 0s 500K .......... .......... .......... .......... .......... 42% 35.0M 0s 550K .......... .......... .......... .......... .......... 46% 37.8M 0s 600K .......... .......... .......... .......... .......... 50% 27.2M 0s 650K .......... .......... .......... .......... .......... 54% 34.7M 0s 700K .......... .......... .......... .......... .......... 57% 172M 0s 750K .......... .......... .......... .......... .......... 61% 31.2M 0s 800K .......... .......... .......... .......... .......... 65% 27.1M 0s 850K .......... .......... .......... .......... .......... 69% 178M 0s 900K .......... .......... .......... .......... .......... 73% 32.7M 0s 950K .......... .......... .......... .......... .......... 77% 33.1M 0s 1000K .......... .......... .......... .......... .......... 81% 187M 0s 1050K .......... .......... .......... .......... .......... 84% 35.7M 0s 1100K .......... .......... .......... .......... .......... 88% 135M 0s 1150K .......... .......... .......... .......... .......... 92% 32.6M 0s 1200K .......... .......... .......... .......... .......... 96% 179M 0s 1250K .......... .......... .......... .......... ..... 100% 34.9M=0.04s

2020-07-08 21:23:44 (33.1 MB/s) - ‘20200704064631467.warc.zip’ saved [1326456/1326456]

Archive: 20200704064631467.warc.zip inflating: 20200704064631467.warc
https://twitter.com/prosodyContext Traceback (most recent call last): File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/recordloader.py", line 224, in _detect_type_load_headers rec_headers = self.warc_parser.parse(stream, statusline) File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/statusandheaders.py", line 270, in parse raise StatusAndHeadersParserException(msg, full_statusline) warcio.statusandheaders.StatusAndHeadersParserException: Expected Status Line starting with ['WARC/1.1', 'WARC/1.0', 'WARC/0.17', 'WARC/0.18'] - Found: WARC-Type: request

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/runner/work/warcreate/warcreate/tmp_AozGuAylsrrQKFj4VPDhfkz6ukle64Lh", line 10, in for record in ArchiveIterator(stream): File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/archiveiterator.py", line 110, in _iterate_records self.record = self._next_record(self.next_line) File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/archiveiterator.py", line 257, in _next_record record = self.loader.parse_record_stream(self.reader, File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/recordloader.py", line 85, in parse_record_stream (the_format, rec_headers) = (self. File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/warcio/recordloader.py", line 229, in _detect_type_load_headers raise ArchiveLoadFailed(msg + str(se.statusline)) warcio.exceptions.ArchiveLoadFailed: Invalid WARC record, first line: WARC-Type: request