johnfelipe / transcribelo

Apache License 2.0
0 stars 0 forks source link

Extract captions #1

Closed johnfelipe closed 2 years ago

johnfelipe commented 2 years ago

URL

https://youtu.be/6QNyQgrozQQ

github-actions[bot] commented 2 years ago
[youtube] 6QNyQgrozQQ: Downloading webpage
[youtube] 6QNyQgrozQQ: Downloading android player API JSON
[youtube] 6QNyQgrozQQ: Downloading MPD manifest
[youtube] 6QNyQgrozQQ: Downloading MPD manifest
[info] 6QNyQgrozQQ: Downloading 1 format(s): 22
[youtube] 6QNyQgrozQQ: Downloading webpage
[youtube] 6QNyQgrozQQ: Downloading android player API JSON
[youtube] 6QNyQgrozQQ: Downloading MPD manifest
[youtube] 6QNyQgrozQQ: Downloading MPD manifest
[info] 6QNyQgrozQQ: Downloading subtitles: en
[info] 6QNyQgrozQQ: Downloading 1 format(s): 22
[info] Writing video subtitles to: Reconocimiento de voz rivera huila 1 de 2 [6QNyQgrozQQ].en.ttml
[download] Destination: Reconocimiento de voz rivera huila 1 de 2 [6QNyQgrozQQ].en.ttml

[download] 1.00KiB at  Unknown B/s (00:00)
[download] 3.00KiB at  Unknown B/s (00:00)
[download] 7.00KiB at    6.48MiB/s (00:00)
[download] 15.00KiB at   10.44MiB/s (00:00)
[download] 20.46KiB at   10.53MiB/s (00:00)
[download] 100% of 20.46KiB in 00:00 at 76.99KiB/s
github-actions[bot] commented 2 years ago
well, good afternoon, at this moment
we are with the council of rivera
huila with the secretary, unfortunately she will
not be able to talk to us because she does not have a microphone on
her computer, but
I will be very attentive to what she writes to me
via chat and I threw her
answering directly by that means
also or by my voice then we are going to
start
the universal voice profile that we see
here is an innovation that we have
made for several months now we have a
copyright certificate and it is the one that is
going to allow so many female voices
as a male voice be recognized with the
program before starting the
demonstration I would like
to ask the secretary if your council
in the room more
specifically have a sound console
microphones per councilor or not
tell me please secretary
ok perfect then  if they have a
sound system there will be a
very good chance that the
real-time plan
will work I'm going to start as if  If we were with
the laptop that you would acquire, it is
a high-end laptop for

gamers which is going to allow all the
transfer and all the processing transaction to
be possible and we can
visualize it as at this moment I am going to
show you
ok then the computer before  of the
session
the laptop that we supply
that comes with everything already pre-installed
will have these two systems on the
left side a text editor
that will receive the transcription in real time
and here an audio
editor is the audio editor  it is the one that is going to be
observed let's say the sound input and
that that sound input is perfect
as we see it at this moment that it
exceeds the minimum ranges there is no
saturation there is no clipping there is nothing of
that at this moment then one is one
initial verification that we are going to do
before using our project
so I am going to start at this moment
again we are with the advice of
rivera huila with him  to the secretary and
I am going to start at that moment with the
explanation of the three plans plan a
plan b and plan c plan a is in
real time previously I asked the
secretary if she had the sound system
that is to say amplification console microphone
all this  given that it is essential for the
plan that each comptroller council or
assembly have this sound system, of
course, because everything that is said
will enter directly into the console and the
console will deliver it to the laptop that
we administer, as you can see in
At this moment, every time I
pause, the system
does everything possible because in less than
a second, two seconds, the software that
brings the transcription in real time,
then it is flat with our
universal voice profile, is going to deliver 92
percent accuracy.  Whether it's a man or a
woman, the person who speaks is
the councilors, secretaries of the office, or
any speaker that at this time, well,
you have,
it's over.  the session and you are already going to
have a very important input,
which is the entire word-for-word transcription
that goes or that has occurred in
the session, that 6% or that 8% maximum
error and practically it is human error, that is to
say, look, yes  you or you are
suddenly seeing a word that omitted a
word that you spelled wrong was it because
she mispronounced it or gabi or used
idiomatic fillers or any type
of element that the software does not understand,
including
words that are not in the vocabulary
that we can add but that  they are not
there in that 8% error it will be
included madam secretary
please confirm through the chat if you are seeing
everything I am talking about in real time
and the software is
transcribing
perfectly then this is the plan am is
the plan that  you are going to use
even look what there is said then and that is
then it was because I said it very
quickly and the software simply goes and
places the word that is closest to it
for example how  or they tell those from
ribera william what the name is
madam secretary
perfect
if that word is there the software will
place it but if it is not it will place the
words that are closest to it then
the name of rivera huila is
riverenses
as you can see  it's riverense but the
plural isn't there so that's what it is enters
into that 8% error it's been four minutes
of recording and the plan has already been
fully explained so we're going to
move on to plan b plan b
is with an audio that you already  have it
pre-recorded, that is, from
previous sessions that have not been able to,
if you can, you can even add the
word, I am going to do it right
now, riverenses,
but not riverenses,
then I am going to add it and you are going to
verify that if it works
here, I am going to go  to vocabulary and I'm going to
add rivero
es riverenses
riverenses
I'm going to give them double enter riverenses
then look that at this moment it has already
been added and if you and your council
hire us we will  teach a method
to add all the words that you
have placed in the minutes that you
already have ready so that you do not have to
do it manually because you imagine
putting all the words that
suddenly are not in the vocabulary that is
needed then there is a method that
We can explain to you
after the acquisition and
training of the project, well, at
this moment it's 5 minutes 34 seconds,
so I'm going to
look, I keep saying, then excuse me, then I
'm going to stop the recording and I'm going
to export the mp3 as if you already
had a  pre-recorded audio
and ready then here I am simply going to
export the audio
rivera huila 2016
the exported 128 kilobits per second and
at a frequency of 44 thousand 100 hertz
I do not need this anymore
and for example if you have an audio
of 5 hours here we are going to  minimize to 5
minutes we are going to do the sample and
test how long it takes to
send me the transcript then you
know  We know that the audio does not work that it
meets the technical characteristics
I am simply going to pull it and I am going to
start at 9 minutes of recording
3 seconds to go it already started at 9
minutes of recording as you see it here
above it takes 5 seconds it is about to
start  to write and transcribe at that
percentage
then that plan b is going to serve for
what you have behind if the audio
does not work suddenly it
suddenly has some type of
and amplification or clipping or saturation
we will also teach you how to
reduce it and  the shofar can be
used for that, but the most important thing
is that you, having a sound system,
connect the laptop remotely.
I'm going to explain how to
suddenly reduce the gain of the laptop, but
the amplification remains the same, so
all this will be in post so that  you
can from now on if you decide to
hire them to
have the sessions ready with a
ready text and the recordings that are
delayed will also serve you with
this plan you see that I am showing you at
this moment I have been a minute 14 seconds
of 5 minutes and something I am going to
verify how long it takes so
that you can see that in an audio of
so much time it is going to take a long time see it
took a  minute 30 seconds, that is to say, from
an audio
let me
rivera huila and here one
of an audio of almost six minutes
look here it says
5 minutes 56 seconds
55 seconds almost 6 minutes it took a
minute and a half then from an audio of 5
hours it will  It will take an hour and a half,
but you are going to have a totally
reliable text at that 92% with an 8% error,
as it says here, and now you are going to be able to
edit that 8% error. I am going to stop

the recording right now.  I'm going to upload it's 11
minutes you give me 20 seconds I'll do it
and we continue