Closed simonw closed 2 years ago
Since you can have multiple issue template forms per repo I don't think that select dropdown is even needed. For this first implementation all I need as input is the URL. The title can even by hard-coded to "Please automatically transcribe and translate this URL".
A neat detail might be if the script updated the issue title to the title of the video as part of running - if that title was set to the default.
Here's what I need from https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#issues
on:
issues:
types:
- opened
Then something like:
steps:
- run: |
echo Issue $NUMBER
env:
NUMBER: ${{ github.event.issue.number }}
I'm going to use github-script here to retrieve the details of the issue. Then I'll run a Python script that extracts any URLs, checks if the user is allowed to execute the action (refs #5) and kicks off youtube-dl
(#3).
I need a requirements.txt
file to install and cache the dependencies - primarily youtube-dl
. https://pypi.org/project/youtube_dl/
Relevant example from github-script
README:
on:
issues:
types: [opened]
jobs:
comment:
runs-on: ubuntu-latest
steps:
- uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '👋 Thanks for reporting!'
})
Maybe it posts a comment saying "Working on that" and then does the work.
To get the issue details: https://octokit.github.io/rest.js/v19#issues-get
octokit.rest.issues.get({
owner,
repo,
issue_number,
});
I'm going to write them to a file that Python can read in the next step.
Or... I could even do the entire implementation in JavaScript. Might keep things a little bit simpler.
Code search: child_process path:.github/workflows/*.yml
https://cs.github.com/?scopeName=All+repos&scope=&q=child_process+path%3A.github%2Fworkflows%2F*.yml
Gave me this example: https://github.com/jupyterlab/jupyterlab/blob/74405d37ad156d8bdc0e7c36f199abc1eb642c1f/.github/workflows/benchmark.yml#L31-L54
- name: Get hashes for PR review event
if: ${{ github.event_name == 'pull_request_review' }}
uses: actions/github-script@v6
with:
script: |
const child_process = require("child_process");
const pull_request = context.payload.pull_request;
child_process.exec(`git merge-base ${pull_request.head.sha} ${pull_request.base.sha}`, (error, stdout, stderr) => {
if (error) {
console.log(error);
process.exit(1);
return;
}
if (stderr) {
console.log(stderr);
process.exit(1);
return;
}
core.exportVariable('OLD_REF_SHA', stdout.trim());
core.exportVariable('NEW_REF_SHA', pull_request.head.sha);
core.exportVariable('PULL_REQUEST_ID', pull_request.number);
});
I could dump the log output from youtube-dl
in a details/summary
element in the issue comment too.
I opened issue #6 and it triggered this run: https://github.com/simonw/transcribe-videos/actions/runs/3119145264/jobs/5058919300
Output of that script:
{}
2021.12.17
So the youtube-dl --version
bit worked but the issue fetching did not.
I'll try logging the full context
.
OK, the context
already has all of the information I need - no need to try and fetch more:
{
"payload": {
"action": "opened",
"issue": {
"active_lock_reason": null,
"assignee": null,
"assignees": [],
"author_association": "OWNER",
"body": "Testing for #2",
"closed_at": null,
"comments": 0,
"comments_url": "https://api.github.com/repos/simonw/transcribe-videos/issues/7/comments",
"created_at": "2022-09-24T17:26:06Z",
"events_url": "https://api.github.com/repos/simonw/transcribe-videos/issues/7/events",
"name": "transcribe-videos",
"node_id": "R_kgDOIDpoZw",
"notifications_url": "[https://api.github.com/repos/simonw/transcribe-videos/notifications{?since](https://api.github.com/repos/simonw/transcribe-videos/notifications%7B?since),all,participating}",
"open_issues": 6,
"open_issues_count": 6,
"owner": {
"avatar_url": "https://avatars.githubusercontent.com/u/9599?v=4",
"events_url": "[https://api.github.com/users/simonw/events{/privacy}](https://api.github.com/users/simonw/events%7B/privacy%7D)",
"followers_url": "https://api.github.com/users/simonw/followers",
"following_url": "[https://api.github.com/users/simonw/following{/other_user}](https://api.github.com/users/simonw/following%7B/other_user%7D)",
"gists_url": "[https://api.github.com/users/simonw/gists{/gist_id}](https://api.github.com/users/simonw/gists%7B/gist_id%7D)",
"gravatar_id": "",
"html_url": "https://github.com/simonw",
"id": 9599,
"login": "simonw",
"node_id": "MDQ6VXNlcjk1OTk=",
"organizations_url": "https://api.github.com/users/simonw/orgs",
"received_events_url": "https://api.github.com/users/simonw/received_events",
"repos_url": "https://api.github.com/users/simonw/repos",
"site_admin": false,
"starred_url": "[https://api.github.com/users/simonw/starred{/owner}{/repo}](https://api.github.com/users/simonw/starred%7B/owner%7D%7B/repo%7D)",
"subscriptions_url": "https://api.github.com/users/simonw/subscriptions",
"type": "User",
"url": "https://api.github.com/users/simonw"
},
"private": true,
"pulls_url": "[https://api.github.com/repos/simonw/transcribe-videos/pulls{/number}](https://api.github.com/repos/simonw/transcribe-videos/pulls%7B/number%7D)",
"pushed_at": "2022-09-24T17:25:48Z",
"releases_url": "[https://api.github.com/repos/simonw/transcribe-videos/releases{/id}](https://api.github.com/repos/simonw/transcribe-videos/releases%7B/id%7D)",
"size": 2,
"ssh_url": "git@github.com:simonw/transcribe-videos.git",
"stargazers_count": 0,
"stargazers_url": "https://api.github.com/repos/simonw/transcribe-videos/stargazers",
"statuses_url": "[https://api.github.com/repos/simonw/transcribe-videos/statuses/{sha}](https://api.github.com/repos/simonw/transcribe-videos/statuses/%7Bsha%7D)",
"subscribers_url": "https://api.github.com/repos/simonw/transcribe-videos/subscribers",
"subscription_url": "https://api.github.com/repos/simonw/transcribe-videos/subscription",
"svn_url": "https://github.com/simonw/transcribe-videos",
"tags_url": "https://api.github.com/repos/simonw/transcribe-videos/tags",
"teams_url": "https://api.github.com/repos/simonw/transcribe-videos/teams",
"topics": [],
"trees_url": "[https://api.github.com/repos/simonw/transcribe-videos/git/trees{/sha}](https://api.github.com/repos/simonw/transcribe-videos/git/trees%7B/sha%7D)",
"updated_at": "2022-09-24T17:09:26Z",
"url": "https://api.github.com/repos/simonw/transcribe-videos",
"visibility": "private",
"watchers": 0,
"watchers_count": 0,
"web_commit_signoff_required": false
},
"sender": {
"avatar_url": "https://avatars.githubusercontent.com/u/9599?v=4",
"events_url": "[https://api.github.com/users/simonw/events{/privacy}](https://api.github.com/users/simonw/events%7B/privacy%7D)",
"followers_url": "https://api.github.com/users/simonw/followers",
"following_url": "[https://api.github.com/users/simonw/following{/other_user}](https://api.github.com/users/simonw/following%7B/other_user%7D)",
"gists_url": "[https://api.github.com/users/simonw/gists{/gist_id}](https://api.github.com/users/simonw/gists%7B/gist_id%7D)",
"gravatar_id": "",
"html_url": "https://github.com/simonw",
"id": 9599,
"login": "simonw",
"node_id": "MDQ6VXNlcjk1OTk=",
"organizations_url": "https://api.github.com/users/simonw/orgs",
"received_events_url": "https://api.github.com/users/simonw/received_events",
"repos_url": "https://api.github.com/users/simonw/repos",
"site_admin": false,
"starred_url": "[https://api.github.com/users/simonw/starred{/owner}{/repo}](https://api.github.com/users/simonw/starred%7B/owner%7D%7B/repo%7D)",
"subscriptions_url": "https://api.github.com/users/simonw/subscriptions",
"type": "User",
"url": "https://api.github.com/users/simonw"
}
},
"eventName": "issues",
"sha": "baccabae6edae65ae5b6279120a21e92cc648e1d",
"ref": "refs/heads/main",
"workflow": ".github/workflows/issue_created.yml",
"action": "__actions_github-script",
"actor": "simonw",
"job": "comment",
"runNumber": 2,
"runId": 3119153836,
"apiUrl": "https://api.github.com/",
"serverUrl": "https://github.com/",
"graphqlUrl": "https://api.github.com/graphql"
}
I can look at context.payload.sender.login
to see if they are allowed to do this.
I can also check and see if context.payload.issue.author_association
is "OWNER"
so owners of the repo can always trigger actions.
I'm going to parse the issue body the easiest way: split into lines and treat the first line that starts with http://
or https://
as being the URL to process.
Fun aside: need to try to avoid command injection attacks here, since I'm passing user input to youtube-dl
.
I should use child_process.spawn()
instead of .exec()
since that takes a list of arguments.
https://nodejs.org/api/child_process.html#child_processspawnsynccommand-args-options - I can use child_process.spawnSync(command[, args]
.
Useful way to test that code locally:
node -e "
const child_process = require('child_process');
console.log(child_process.spawnSync('youtube-dl', ['--version'], {
encoding: 'utf8'
}));
"
{
status: 0,
signal: null,
output: [ null, '2021.12.17\n', '' ],
pid: 15152,
stdout: '2021.12.17\n',
stderr: ''
}
Tried this:
youtube-dl --all-subs --skip-download 'https://www.youtube.com/watch?v=m0mwlSZ0bQQ'
Got a single file:
It's a pile of mining waste. Want to go skiing on it-m0mwlSZ0bQQ.en.vtt
Then I tried getting the auto-generated subs too:
youtube-dl --write-auto-sub --all-subs --skip-download 'https://www.youtube.com/watch?v=m0mwlSZ0bQQ'
This got me a LOT of files. Truncated:
[youtube] m0mwlSZ0bQQ: Downloading webpage
[info] Writing video subtitles to: It's a pile of mining waste. Want to go skiing on it-m0mwlSZ0bQQ.af.vtt
[info] Writing video subtitles to: It's a pile of mining waste. Want to go skiing on it-m0mwlSZ0bQQ.ak.vtt
[info] Writing video subtitles to: It's a pile of mining waste. Want to go skiing on it-m0mwlSZ0bQQ.sq.vtt
...
It was 126 total!
The .es.*
one ends like this:
00:03:20.879 --> 00:03:23.330 align:start position:0%
Voy a bajar el pequeño
teleférico <00:03:21.404><c>¿Cómo </c><00:03:21.929><c>puedo? </c><00:03:22.454><c>Estoy </c><00:03:22.979><c>realmente</c>
00:03:23.330 --> 00:03:23.340 align:start position:0%
teleférico ¿Cómo puedo? Estoy realmente
00:03:23.340 --> 00:03:24.830 align:start position:0%
teleférico ¿Cómo puedo? Estoy realmente
aterrorizado. <00:03:23.603><c>Me </c><00:03:23.866><c>voy </c><00:03:24.129><c>a </c><00:03:24.392><c>resbalar. </c><00:03:24.655><c>¿</c>
00:03:24.830 --> 00:03:24.840 align:start position:0%
aterrorizado. Me voy a resbalar. ¿
00:03:24.840 --> 00:03:27.710 align:start position:0%
aterrorizado. Me voy a resbalar. ¿
Cómo <00:03:25.740><c>salgo </c><00:03:26.640><c>de </c><00:03:27.540><c>esto?</c>
00:03:27.710 --> 00:03:30.319 align:start position:0%
Cómo salgo de esto?
Trying with https://www.youtube.com/watch?v=OJIzTVyxIAw
- the Russian one I tried in #1. Without --write-auto-sub
I get nothing, because that video does not have captions.
With --write-auto-sub --sub-lang en,ru
I get ALL of those files - it looks like the --write-auto-sub
option always gets everything, ignoring the --sub-lang
option entirely.
More on that here: https://askubuntu.com/questions/1023339/youtube-dl-keep-both-auto-generated-subtitles-and-prewritten-ones
It sounds like if there are manual subtitles AND auto subtitles the manual ones over-write the auto ones, unless you run the command twice and save the files with different names that way.
These subtitles are a bit messay:
d % cat *.en.*
WEBVTT
Kind: captions
Language: en
00:00:00.350 --> 00:00:06.079 align:start position:0%
March <00:00:00.879><c>18, </c><00:00:01.408><c>2018 </c><00:00:01.937><c>the </c><00:00:02.466><c>shadow </c><00:00:02.995><c>of </c><00:00:03.524><c>the </c><00:00:04.053><c>main </c><00:00:04.582><c>choice </c><00:00:05.111><c>of </c><00:00:05.640><c>the</c>
00:00:06.079 --> 00:00:06.089 align:start position:0%
March 18, 2018 the shadow of the main choice of the
00:00:06.089 --> 00:00:10.700 align:start position:0%
March 18, 2018 the shadow of the main choice of the
country <00:00:06.672><c>March </c><00:00:07.255><c>18 </c><00:00:07.838><c>the </c><00:00:08.421><c>day </c><00:00:09.004><c>that </c><00:00:09.587><c>decides </c><00:00:10.170><c>the</c>
00:00:10.700 --> 00:00:10.710 align:start position:0%
country March 18 the day that decides the
00:00:10.710 --> 00:00:14.539 align:start position:0%
country March 18 the day that decides the
fate <00:00:11.235><c>of </c><00:00:11.760><c>Russia </c><00:00:12.285><c>the </c><00:00:12.810><c>day </c><00:00:13.335><c>that </c><00:00:13.860><c>determines</c>
00:00:14.539 --> 00:00:14.549 align:start position:0%
fate of Russia the day that determines
00:00:14.549 --> 00:00:16.599 align:start position:0%
fate of Russia the day that determines
our <00:00:15.059><c>future</c>
00:00:16.599 --> 00:00:16.609 align:start position:0%
our future
It looks like those are encoding actual animation, where the subtitles scroll to match the text as it is spoken.
I don't want that though!
d % youtube-dl --list-subs 'https://www.youtube.com/watch?v=OJIzTVyxIAw' | grep en
en vtt, ttml, srv3, srv2, srv1
I'm getting vtt
by default - maybe one of the other formats would avoid the animation issue and just give me the plain text?
Tried two other formats like this:
mkdir ttml
cd ttml
youtube-dl --sub-format ttml --all-subs --skip-download --write-auto-sub 'https://www.youtube.com/watch?v=OJIzTVyxIAw'
cd ..
mkdir srv3
cd srv3
youtube-dl --sub-format srv3 --all-subs --skip-download --write-auto-sub 'https://www.youtube.com/watch?v=OJIzTVyxIAw'
Here's ttml
:
d % cat ttml/*.en.*
<?xml version="1.0" encoding="utf-8" ?>
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tts="http://www.w3.org/ns/ttml#styling" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" ttp:profile="http://www.w3.org/TR/profile/sdp-us" >
<head>
<styling>
<style xml:id="s1" tts:textAlign="center" tts:extent="90% 90%" tts:origin="5% 5%" tts:displayAlign="after"/>
<style xml:id="s2" tts:fontSize=".72c" tts:backgroundColor="black" tts:color="white"/>
</styling>
<layout>
<region xml:id="r1" style="s1"/>
</layout>
</head>
<body region="r1">
<div>
<p begin="00:00:00.350" end="00:00:10.710" style="s2">March 18, 2018 the shadow of the main choice of the</p>
<p begin="00:00:06.089" end="00:00:14.549" style="s2">country March 18 the day that decides the</p>
<p begin="00:00:10.710" end="00:00:16.609" style="s2">fate of Russia the day that determines</p>
<p begin="00:00:14.549" end="00:00:21.060" style="s2">our future</p>
<p begin="00:00:16.609" end="00:00:22.220" style="s2">March 18 presidential elections in the Russian</p>
<p begin="00:00:21.060" end="00:00:25.140" style="s2">Federation</p>
<p begin="00:00:22.220" end="00:00:28.230" style="s2">every citizen of the country who has reached the age of</p>
<p begin="00:00:25.140" end="00:00:31.579" style="s2">18 has the right to</p>
<p begin="00:00:28.230" end="00:00:37.840" style="s2">vote in the presidential elections in Russia</p>
<p begin="00:00:31.579" end="00:00:40.999" style="s2">March 18 is the day when every vote matters</p>
<p begin="00:00:37.840" end="00:00:40.999" style="s2">[music ]</p>
</div>
</body>
</tt>
And here's srv3
(truncated):
d % cat srv3/*.en.*
<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<head>
<ws id="0"/>
<ws id="1" mh="2" ju="0" sd="3"/>
<wp id="0"/>
<wp id="1" ap="6" ah="20" av="100" rc="2" cc="40"/>
</head>
<body>
<w t="0" id="1" wp="1" ws="1"/>
<p t="350" d="10360" w="1"><s ac="252">March </s><s t="529" ac="252">18, </s><s t="1058" ac="252">2018 </s><s t="1587" ac="252">the </s><s t="2116" ac="252">shadow </s><s t="2645" ac="252">of </s><s t="3174" ac="252">the </s><s t="3703" ac="252">main </s><s t="4232" ac="252">choice </s><s t="4761" ac="252">of </s><s t="5290" ac="252">the</s></p>
<p t="6079" d="4631" w="1" a="1">
</p>
<p t="6089" d="8460" w="1"><s ac="238">country </s><s t="583" ac="227">March </s><s t="1166" ac="227">18 </s><s t="1749" ac="227">the </s><s t="2332" ac="227">day </s><s t="2915" ac="227">that </s><s t="3498" ac="227">decides </s><s t="4081" ac="227">the</s></p>
I think I like ttml
the best - looks easy to parse too.
I'm going to store the full subtitle files in the repo. The issue comment reply will just contain the text, in an easy-to-paste format.
I'm going to finish this work in:
Refs:
1
Started prototyping that here: https://github.com/simonw/try-out-issue-template-forms/issues/1 - with this issue template (only allowed in public repos at the moment): https://github.com/simonw/try-out-issue-template-forms/blob/afe6c8e3ed86ba03fb7318ce8131613ff4ecf16c/.github/ISSUE_TEMPLATE/url.yml
The issue that this creates looks like this: