pietrop / slate-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. Using the SlateJs editor.
https://pietrop.github.io/slate-transcript-editor
Other
75 stars 33 forks source link

Combined theirstory changes #16

Closed jshearer closed 3 years ago

jshearer commented 3 years ago

We have been working on adding some features to the transcript editor that were requested by some of our customers, as well as fixing any bugs we find. This PR contains the past few weeks of work. Importantly, there's also a pull request in align-diarized-text to significantly improve some of the performance issues we were seeing in longer transcripts, so I left out the commits in here bumping that dependency as I imagine you'll do it when you merge :)

If you'd like to review/merge these individually I'm happy to make separate PRs for each feature/fix, it was just easier to package it as one :)

pietrop commented 3 years ago

leaving a note here for integration with pietrop/digital-paper-edit-client and pietrop/digital-paper-edit-electron about using Node.js features in Electron's Web Workers

pietrop commented 3 years ago

Thanks for this @jshearer !

I had one issue when i tried the storybook locally, and I went to export a word doc with OHMS in one window and a vtt with speakers in another, to checkout the output. The cursor started spinning, and didn't get any output (waited a few minutes)

I was able to export plain text, and plain text + spakers but not plaintext + timecodes.

This was after upgrading to the new module

- "align-diarized-text": "^1.0.8",
+"align-diarized-text": "^1.0.9",

I also reverted back to 1.0.8 and got the same issue

Let me know if you have any ideas on what could be causing this, and whether you get the same issue on your end?

btw, I had removed node_modules and package-lock.json before running npm install.

pietrop commented 3 years ago

I also don't fully understand what insertTimecodesInline (from inline-interval-timecodes ) does?

jshearer commented 3 years ago

I'm wondering if you saw any errors in the console when exporting? I just tested and am able to export OHMS, vtt, plaintext+timecodes etc. FWIW this is all running on transcripts that came from Google with speaker diarization and were run through gcp-to-dpe v2.

insertTimecodesInline is admittedly kind of a weird feature: the OHMS export wants timecodes ever interval (30s in this case) inserted in the middle of the text. Here's an example, notice the [00:00:30], [00:01:00] etc Screen Shot 2020-11-30 at 11 54 53 PM

pietrop commented 3 years ago

ok, cool. No didn't see anything significant in the console 🤷‍♂️.

In theory once the transcript is converted to dpe format, shouldn't make too much of difference where it came from originally, unless there are bugs in the converters.

Did you try it in the storybook locally as well?

http://localhost:6006/?path=/story/slatetranscripteditor--demo

pietrop commented 3 years ago

ok, I am not sure why, but I think I might have figured it out 🤔 🥳

Something not quiet right about convertSlateToDpeAsync, not sure what exactly tho. but if I change restoreTimecodes in src/util/restore-timecodes to use converSlateToDpe instead of convertSlateToDpeAsync then I am able to export

it actually seems quite snappy at restoring timecodes (and I was still on align-diarized-text v1.0.8 so go figure, will have to re try with v 1.0.9 might be even faster 🎉 )

import convertDpeToSlate from '../dpe-to-slate';
+import converSlateToDpe, { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';
- import { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';

const restoreTimecodes = async ({ slateValue, transcriptData }) => {
  console.log('restoreTimecodes', slateValue, transcriptData);
+  const aligneDpeData = await converSlateToDpe(slateValue, transcriptData);
-  const aligneDpeData = await convertSlateToDpeAsync(slateValue, transcriptData);
  const alignedSlateData = convertDpeToSlate(aligneDpeData);
  return alignedSlateData;
};

export default restoreTimecodes;

was not able to export vtt and other caption files tho, I'd need to look more closely at that.

jshearer commented 3 years ago

Okay, I'll look more into this tomorrow (today? :p)

We also did notice a bug where sometimes if you try to bulk-change a speaker name while an export is happening, the browser will hang like before, and also that bulk-changing a speaker name more than once doesn't seem to work, so those are also on my list here.

Are the transcripts you were using to cause these issues public/somewhere I can see them to try and reproduce myself?

On Tue, Dec 1, 2020, 12:20 AM Pietro notifications@github.com wrote:

ok, I am not sure why, but I figured it out.

Something not quiet right about convertSlateToDpeAsync, not sure what exactly tho. but if I change restoreTimecodes in src/util/restore-timecodes to use converSlateToDpe instead of convertSlateToDpeAsync then I am able to export

  • plain txt with timecodes
  • word document with timecodes
  • docx word (OHMS)

import convertDpeToSlate from '../dpe-to-slate';+import converSlateToDpe, { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';- import { convertSlateToDpeAsync } from '../export-adapters/slate-to-dpe/index.js';

const restoreTimecodes = async ({ slateValue, transcriptData }) => { console.log('restoreTimecodes', slateValue, transcriptData);+ const aligneDpeData = await converSlateToDpe(slateValue, transcriptData);- const aligneDpeData = await convertSlateToDpeAsync(slateValue, transcriptData); const alignedSlateData = convertDpeToSlate(aligneDpeData); return alignedSlateData; };

export default restoreTimecodes;

was not able to export vtt and other caption files tho, I'd need to look more closely at that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pietrop/slate-transcript-editor/pull/16#issuecomment-736225588, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBKPDU3ANFNME4SFRUQHMTSSR4LLANCNFSM4UFM42UA .

pietrop commented 3 years ago

Yeah, so to reproduce you can run

npm start 

That starts the storybook locally at

http://localhost:6006/?path=/story/slatetranscripteditor--demo

It be the same as pietropassarelli.com/slate-transcript-editor but with the local changes obv.

Screen Shot 2020-12-01 at 8 33 10 AM

You can see the various stories here slate-transcript-editor/src/components/1-SlateTranscriptEditor.stories.js#L30-L46

They are meant to exemplify various initialization as we well as edge cases. Eg long transcripts etc

For transcriptions think I am mostly using this one soleio-dpe and video Originally from PBS frontline transparency project on YouTube.

Let me know if you got any questions :)

pietrop commented 3 years ago

ok, yeah captions export wasn't working for me for thee same reason as the other export - using convertSlateToDpeAsync instead of converSlateToDpe in getEditorContent in src/components/index.js

pietrop commented 3 years ago

Removed the service worker part, and merged the progress so far. We can do a separate PR for the service work, if you go that to work.