Closed jshearer closed 3 years ago
leaving a note here for integration with pietrop/digital-paper-edit-client and pietrop/digital-paper-edit-electron about using Node.js features in Electron's Web Workers
Thanks for this @jshearer !
I had one issue when i tried the storybook locally, and I went to export a word doc with OHMS in one window and a vtt with speakers in another, to checkout the output. The cursor started spinning, and didn't get any output (waited a few minutes)
I was able to export plain text, and plain text + spakers but not plaintext + timecodes.
This was after upgrading to the new module
- "align-diarized-text": "^1.0.8",
+"align-diarized-text": "^1.0.9",
I also reverted back to 1.0.8
and got the same issue
Let me know if you have any ideas on what could be causing this, and whether you get the same issue on your end?
btw, I had removed node_modules
and package-lock.json
before running npm install
.
I also don't fully understand what insertTimecodesInline
(from inline-interval-timecodes
) does?
I'm wondering if you saw any errors in the console when exporting? I just tested and am able to export OHMS, vtt, plaintext+timecodes etc. FWIW this is all running on transcripts that came from Google with speaker diarization and were run through gcp-to-dpe
v2.
insertTimecodesInline
is admittedly kind of a weird feature: the OHMS export wants timecodes ever interval (30s in this case) inserted in the middle of the text. Here's an example, notice the [00:00:30]
, [00:01:00]
etc
ok, cool. No didn't see anything significant in the console 🤷♂️.
In theory once the transcript is converted to dpe format, shouldn't make too much of difference where it came from originally, unless there are bugs in the converters.
Did you try it in the storybook locally as well?
http://localhost:6006/?path=/story/slatetranscripteditor--demo
ok, I am not sure why, but I think I might have figured it out 🤔 🥳
Something not quiet right about convertSlateToDpeAsync
, not sure what exactly tho. but if I change restoreTimecodes
in src/util/restore-timecodes
to use converSlateToDpe
instead of convertSlateToDpeAsync
then I am able to export
it actually seems quite snappy at restoring timecodes (and I was still on align-diarized-text
v1.0.8
so go figure, will have to re try with v 1.0.9
might be even faster 🎉 )
import convertDpeToSlate from '../dpe-to-slate';
+import converSlateToDpe, { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';
- import { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';
const restoreTimecodes = async ({ slateValue, transcriptData }) => {
console.log('restoreTimecodes', slateValue, transcriptData);
+ const aligneDpeData = await converSlateToDpe(slateValue, transcriptData);
- const aligneDpeData = await convertSlateToDpeAsync(slateValue, transcriptData);
const alignedSlateData = convertDpeToSlate(aligneDpeData);
return alignedSlateData;
};
export default restoreTimecodes;
was not able to export vtt
and other caption files tho, I'd need to look more closely at that.
Okay, I'll look more into this tomorrow (today? :p)
We also did notice a bug where sometimes if you try to bulk-change a speaker name while an export is happening, the browser will hang like before, and also that bulk-changing a speaker name more than once doesn't seem to work, so those are also on my list here.
Are the transcripts you were using to cause these issues public/somewhere I can see them to try and reproduce myself?
On Tue, Dec 1, 2020, 12:20 AM Pietro notifications@github.com wrote:
ok, I am not sure why, but I figured it out.
Something not quiet right about convertSlateToDpeAsync, not sure what exactly tho. but if I change restoreTimecodes in src/util/restore-timecodes to use converSlateToDpe instead of convertSlateToDpeAsync then I am able to export
- plain txt with timecodes
- word document with timecodes
- docx word (OHMS)
import convertDpeToSlate from '../dpe-to-slate';+import converSlateToDpe, { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';- import { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';
const restoreTimecodes = async ({ slateValue, transcriptData }) => { console.log('restoreTimecodes', slateValue, transcriptData);+ const aligneDpeData = await converSlateToDpe(slateValue, transcriptData);- const aligneDpeData = await convertSlateToDpeAsync(slateValue, transcriptData); const alignedSlateData = convertDpeToSlate(aligneDpeData); return alignedSlateData; };
export default restoreTimecodes;
was not able to export vtt and other caption files tho, I'd need to look more closely at that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pietrop/slate-transcript-editor/pull/16#issuecomment-736225588, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBKPDU3ANFNME4SFRUQHMTSSR4LLANCNFSM4UFM42UA .
Yeah, so to reproduce you can run
npm start
That starts the storybook locally at
http://localhost:6006/?path=/story/slatetranscripteditor--demo
It be the same as pietropassarelli.com/slate-transcript-editor but with the local changes obv.
You can see the various stories here slate-transcript-editor/src/components/1-SlateTranscriptEditor.stories.js#L30-L46
They are meant to exemplify various initialization as we well as edge cases. Eg long transcripts etc
For transcriptions think I am mostly using this one soleio-dpe and video Originally from PBS frontline transparency project on YouTube.
Let me know if you got any questions :)
ok, yeah captions export wasn't working for me for thee same reason as the other export - using convertSlateToDpeAsync
instead of converSlateToDpe
in getEditorContent
in src/components/index.js
Removed the service worker part, and merged the progress so far. We can do a separate PR for the service work, if you go that to work.
We have been working on adding some features to the transcript editor that were requested by some of our customers, as well as fixing any bugs we find. This PR contains the past few weeks of work. Importantly, there's also a pull request in
align-diarized-text
to significantly improve some of the performance issues we were seeing in longer transcripts, so I left out the commits in here bumping that dependency as I imagine you'll do it when you merge :)VTT with speakers
export option to include speaker info in the regular.vtt
export, in the correct vtt syntaxVTT with speakers and paragraphs
to generate an export in.vtt
format, but instead of splitting by max characters on screen, we split by paragraph.Word (OHMS)
export option to generate a.docx
file in the format expected by OHMSalign-diarized-text
in a background task to prevent hanging the browser. This should work by itself, but we paired it with the webpack worker-plugin which works nicely.If you'd like to review/merge these individually I'm happy to make separate PRs for each feature/fix, it was just easier to package it as one :)