Closed pietrop closed 3 years ago
also relevant via @xshy216 https://github.com/pietrop/slate-transcript-editor/issues/10#issuecomment-722846904
An update on the latest thinking, and a chance to recap some of the current progress.
After talking to @rememberlenny I decided defer trying out pagination in favor of an approach that tries to single out paragraphs that have changed and align only those.
There's two ways in which you could do that,
onKeyDown
and/or onChange
and keep some sort of list that keeps track where the changes in the doc have been made, based on user cursor and selection. For now this seems laborious.Slightly unrelated, but relevant, similar to the DraftJs approach of using entities in @bbc/react-transcript-editor
, but somehow way more performant, we can bring back clickable words, by adding them as an attribute to the text child node, along side the text attribute.
We can add onDoubleClick
to the renderLeaf
component.
onDoubleClick={handleTimedTextClick}
And use a getSelectionNodes
helper function to use slateJS selection/cursor position to return timecode of current word.
Assuming text has not been edited using selection offset vs word's objects list text char count gives you the start time of the word being clicked on (if that makes sesnse?).
Option 2 assumes that paragraphs are not changing, eg splitting or merging a paragraph. OR that this is being handled separately from the alignment process.
For now I've disabled splitting and merging paragraph, via Enter
and Backspace
key (eg if Backspace
is at beginning of the paragraph). However you can still delete multiple words within one paragraph.
One idea from @rememberlenny is that If you don't run the alignment on every keystroke or when the user's stop typing (which are both possible optimization to consider - via @gridinoc) then you need to find which paragraphs have changed, and only align those.
I found that lodash differenceWith
is pretty snappy. And you can specify a comparator function. Which allows you to for example only compare the text attribute of the child node, as opposed to the whole paragraph block.
Once you have the individual paragraphs that need aligning you can run alignSTT
on each and replace them in the slateJs editor current content value list of paragraphs.
See latest commit of the PR https://github.com/pietrop/slate-transcript-editor/pull/36 for more details on this.
Enter
. Eg split associated list of words
objects in the two new paragraphsBackspace
. Eg merge the list of words from in the two old paragraphsRefactor/clean up
cloneDeep
convertDpeToSlate
for comparison. Eg save in state slateJs pre last changed(?)differenceWith
computation step. (Altho would need to figure out how to handle if corrects one paragraph, then go to the next one quickly eg without triggering an alignment in between)Also
Enter
with selection that spans across multiple paragraphs. Do you need to remove those stt words list from the paragraph block or should keep this disabled for now ?~ for now intercepted and disabled it insteadBackspace
with selection that spans across multiple paragraphs. Do you need to remove those stt words list from the paragraph block or should keep this disabled for now ?And
Some thoughts after recent refactor https://github.com/pietrop/slate-transcript-editor/pull/36
Enter
. Eg split associated list of words
objects in the two new paragraphsBackspace
. Eg merge the list of words from in the two old paragraphsEnter
with selection that spans across multiple paragraphs. Do you need to remove those stt words list from the paragraph block or should keep this disabled for now ?~ for now intercepted and disabled it insteadBackspace
with selection that spans across multiple paragraphs. Do you need to remove those stt words list from the paragraph block or should keep this disabled for now ?~ for now intercepted and disabled it insteadon 💡 ~You are not allowed to completely delete a paragraph?~ as it could make things easier for alignment, as a paragraph will always have timed words associated with it.
This would mean that you are running the STT align against the most recent re-alignment, as opposed to the original STT data. But would give flexibility to handle changing paragraphs. As well as skip alignment of paragraphs that might not needed.
Still unsure of frequency of the alignment, def on save, but not sure if it should happen on pause typing, maybe not for now. Need to check performance against longer file (1 to 5 hours example)
Updated storybook demo https://pietropassarelli.com/slate-transcript-editor/ to reflect this PR https://github.com/pietrop/slate-transcript-editor/pull/36
PR https://github.com/pietrop/slate-transcript-editor/pull/36 recap
this has been merged to master and deployed alpha releases to test it out and make it easier to revert back if needed. Will bump up the version when there's more confidence that it was a successful refactor that didn't introduce 🐞
closing this for now.
Working on this PR https://github.com/pietrop/slate-transcript-editor/pull/30 I run into an issue with figuring out the right logic to paginate the transcript.
The issue
TL;DR: The issue is that when the user corrects the text, it might delete, substitute or insert new words. These operations tend to loose the time-codes originally associated with each word. The alignment module currently in use, loses performance for transcripts over one 1 hour. So we are considering pagination as a ~quick~ fix.
If you truly want the TL;DR version skip to the Pagination heading. Otherwise click here for more context
### ContextSome
`slate-transcript-editor` builds on top of the lessons learned from developing [@bbc/react-transcript-editor](https://github.com/bbc/react-transcript-editor) (based on [draftJs](https://draftjs.org/)). As the name suggests `slate-transcript-editor` is built on top of [slateJs](https://slatejs.org) augmenting it with transcript editing domain specific functionalities. For more on "draftjs vs slatejs" for this use case, see [these notes](https://github.com/pietrop/slate-transcript-editor/blob/master/docs/notes/draftjs-vs-slatejs.md). It is a react transcript editor component to allow users to correct automated transcriptions of audio or video generated from speech to text services. It is used in use cases such as [autoEdit](https://www.autoedit.io), an app to edit audio/video interviews, as well as other situation where users might need to correct transcriptions, for a variety of use cases. The ambition is to have a component that takes in timed text (eg a list of words with start times), allows the user to correct the text (providing some convenience features, such pause while typing, and keeping some kind of correspondence between the text and audio/video) and on save returns timed text in the same json format (referred to, for convenience, as dpe format, after the digital paper edit project where it was first formalized). ```js { "words": [ { "end": 0.46, // in seconds "start": 0, "text": "Hello" }, { "end": 1.02, "start": 0.46, "text": "World" }, ... ] "paragraphs": [ { "speaker": "SPEAKER_A", "start": 0, "end": 3 }, { "speaker": "SPEAKER_B", "start": 3, "end": 19.2 }, ... ] } ``` As part of `slate-transcript-editor` this dpe format is then converted into [slateJs](https://www.slatejs.org/) data model. [see storybook demo to see the `slate-transcript-editor` react componet it in practice](https://pietropassarelli.com/slate-transcript-editor)quickbackground for those new to the project.side note on word level time-codes and clickable words
I should mention that in [fact2_transcription_editor](https://github.com/pietrop/fact2_transcription_editor) you could click on individual words to jump to corresponding point in the media. With something equivalent to ```html Hello ... ``` A pattern I had first come across in [hyperaud.io's blog description of "hypertranscripts"](https://hyperaud.io/blog/hypertranscripts/) by @maboa & @gridinocsome more background and info on this solution
This solution was first introduced by @chrisbaume in [bbc/dialogger](https://github.com/bbc/dialogger) ([presented at textAV 2017](https://textav.gitbook.io/textav-event/projects/bbc-dialogger)) it modified [CKEditor](https://ckeditor.com) (at the time draftJS was not around yet) and run the alignment server side in a custom python module [sttalign.py](https://github.com/pietrop/stt-align-node/blob/master/docs/python-version/sttalign.py) With @chrisbaume's help I converted the python code into a node module [stt-align-node](https://github.com/pietrop/stt-align-node) which is used in [@bbc/react-transcript-editor](https://github.com/bbc/react-transcript-editor) and [slate-transcript-editor](https://github.com/pietrop/slate-transcript-editor) one issue in converting from python to [the node version](https://github.com/pietrop/stt-align-node/blob/master/src/align/index.js) is that for diffing python uses the [difflib](https://github.com/pietrop/stt-align-node/blob/master/docs/python-version/sttalign.py#L31) that is [part of the core library](https://docs.python.org/3/library/difflib.html) while in the node module [we use](https://github.com/pietrop/stt-align-node/blob/master/src/index.js#L27) , [difflib.js](https://github.com/qiao/difflib.js) which might not be as performant (❓ 🤷♂️ ) When a word is inserted, (eg was not recognized by the STT services and the users adds it manually) in this type of alignment there are no time-codes for it. Via interpolation of time-codes of neighboring words, we bring back add some time-codes. In the python version the time-codes interpolation is done via [numpy](https://numpy.org) to [linearly interpolate the missing times](https://github.com/pietrop/stt-align-node/blob/master/docs/python-version/sttalign.py#L3-L16) In the [node version the interpolation](https://github.com/pietrop/stt-align-node/blob/master/src/align/index.js#L61-L95) is done via the [everpolate](http://borischumichev.github.io/everpolate/#linear) module and again it might not be as performant as the python version (❓ 🤷♂️ ).more on retaining speaker labels after alignement
There is also a workaround for handling retaining speaker labels at paragraph level when using this module to run the alignment. The module itself only aligns the words. To re-introduce the speakers, you just compare the aligned words with the paragraphs with speaker info. [Example of converting into slateJs format](https://github.com/pietrop/slate-transcript-editor/blob/master/src/util/update-timestamps/index.js#L15-L47) or into [dpe format from slateJs](https://github.com/pietrop/slate-transcript-editor/blob/pagination/src/util/export-adapters/slate-to-dpe/index.js#L14-L40)Pagination
For
slate-transcript-editor
we've been using (option 3) client side alignment with stt-align-node to restore time-codes on user's save.However because of the performance issue on large transcription, we've been considering pagination - PR https://github.com/pietrop/slate-transcript-editor/pull/30 but run into a few issues.
For now we can assume the transcription comes as one payload from the server. And I've been splitting it into one hour chunks.
The idea is that the slateJs editor can be responsible for the text editing part, and alignment, save, export in various format can be done in the parent component to provide a cohesive interface that for example. Merges all the pages into one doc before exporting but only updates the current chunk when saving.
questions
I am going to continue to try a few other things here but any thoughts, ideas 💡 or examples on react best practice when dealing with react to paginate text editors are much appreciated.
Quick disclaimer: Last but not least this is my best effort to collect info on this topic in order to frame the problem and hopefully get closer to a solution, if some of these are not as accurate as they should be, feel free to let me know in the comments.