kobotoolbox / kpi

kpi is the server for KoboToolbox. It includes an API for users to access data and manage their forms, question library, sharing settings, create reports, and export data.
https://www.kobotoolbox.org
GNU Affero General Public License v3.0
133 stars 181 forks source link

NLP Transcription and translation feature #2637

Open Ig-Rebollo opened 4 years ago

Ig-Rebollo commented 4 years ago

User workflow research, needs analysis and design of the NLP audio recording, transcription and translation feature

Ig-Rebollo commented 4 years ago

@tinok @jnm Made a short document with some notes and ideas for the transcription and translation feature based on Translators Without Borders workflow Grace sent. I included some questions at the end of the document you might want to take a look at. I was planning to send those questions to Grace, but I feel some of them are more related to our own scope, so it is probably better to run it by you first. Also, the final questions relate to the mockups made by Tino, which in some cases I found confusing.

Here is the link, please feel free to comment directly in the doc: https://docs.google.com/document/d/1WhocpoDWO4UW4ecRxJ1O6hC2NcAL4MC0J5nJ11hvzd4/edit?usp=sharing

I am also working on the initial wireframes in Balsamiq simultaneously, though discussing this document will help inform the designs, specially the questions at the end.

tinok commented 4 years ago

Thanks @Ig-Rebollo -- I added comments.

Ig-Rebollo commented 4 years ago

Hi @tinok and @jnm

As discussed I have been working on the initial wireframes for this feature based on the feedback and guidance we received, previous research, etc.

I created a series of clickable wireframes in Balsamiq that we can use to test the design with some users. Maybe starting simply with some of the team. You can download the (clickable) PDF I created here in our Google Drive

You can try to click through it or simply take a look at the individual slides. If you have some comments or feedback, please do let me know!

I can also guide you through them in our call tomorrow so we can discuss every aspect more in depth.

tinok commented 4 years ago

@Ig-Rebollo Some more comments from our call:

dorey commented 4 years ago
Ig-Rebollo commented 4 years ago

Thanks @tinok and @dorey for your thorough feedback and putting the time in the call to go through all the wireframes. It was very informative and helpful. I'm already working to incorporate all your comments and create a second round of iterations. Will let you know when they are ready so we can schedule another call to review everything again.

Ig-Rebollo commented 4 years ago

Hi @tinok and @dorey,

I have been going back and forth with these designs for the most part of the week. This is far from ideal yet, as I am still thinking (and rethinking) many aspects of this section. There are many moving pieces in this feature, and I feel it can be done in a simpler and more elegant way, but I feel it will take a few more iterations to get there.

For the moment being, just wanted to share with you what I have so far as an update, just to keep you in the loop and so you know the direction this is taking. Feel free to comment and send more feedback, but bear in mind I will keep working on them.

NLPfeature-mockups-May15.pdf

The biggest issue I have is trying to fit everything in the table. Even though I know you could scroll right as we currently do in the table view, I want to avoid placing any important information or button in there, as it would literally hide it for many (or most) users. I tried to come up with solutions to that (like the more info arrow) but I don't feel it works very well. It is also quite the challenge to fit both the action buttons and a snippet of the translated or transcribed text.

As I said before, I need to keep thinking about all this, but any input would be welcomed!

Ig-Rebollo commented 4 years ago

@tinok @jnm @dorey Here are the latest mockups of the transcription and translation feature with the modifications and additions we discussed. I'll keep using this folder to upload all new mockups.

Let me know if you have any questions, comments, or further feedback. We're getting closer but this is still a work in progress!

Ig-Rebollo commented 4 years ago

@tinok @jnm @dorey some updated mockups on the transcription/translation feature. There were some key changes from the last call Tino and I had. The most relevant one is that the page will prompt you to select a question before getting started, displaying all the responses for one question at a time. Selecting all questions at once is also an option (as it might be useful for specific cases) though it is not encouraged as before.

Split the new designs in 3 animations for you to see the behaviour of each part easily:

  1. The initial screen to select a question, and changing questions: Starting animation

  2. The expanded view and the possibility to add columns for additional information/sorting: Full view and add columns

  3. The filter options and how to remove filters (note that you could also use the filters even if the column you're filtering is not visible): Filter animation

You can find all PDF mockups in this GDrive folder

Other features and behaviour remains the same than in previous mockups (i.e. playing a recording on the table view, the editor mode, downloading files and seeing its progress on the table, etc.)

Let me know what you think. It feels that there are a million moving pieces in this project, but I think we're getting there.

tinok commented 4 years ago

@Ig-Rebollo

  1. For the first screen, selecting a question, how would it handle multiple translations? Many forms use 3 and more translations. I would imagine some way of selecting the language in parallel to selecting the question. (e.g. a second dropdown). In a multilingual form like in the mockup, there is going to be an English and a Spanish label for every question (not some questions only English or some only Spanish).

image

  1. The same concern for the optional columns that are displayed on the right now: It seems that you can change the label language for each column bu clicking on it? But that would mean you need to click once on each column if you want to change the language for all of them. As a user I'd expect to select the language once (see (1)) and then be able to switch it for all columns (e.g. in the settings cog, not by clicking on each column). I would expect to change the sort order when clicking on a column.

  2. Just a small issue: It would be great if the examples are closer to real forms. The column headers would be the same as the questions (so can't be just 'Country', would be 'Country of submission' in your example). There is no meta data called Country but it's a common question. You could use "Preferred language", "Participant ID", "Gender", "Where do you currently live?", "When was the last time you ate today?", etc. image

Ig-Rebollo commented 4 years ago

@tinok (and @dorey @jnm) some additional comments and corrections:

  1. In the previous mockups, the questions are the same, just translations to Spanish of the same question. It simply lists all of them in the dropdown. However you are right, it might be easier to separate them more clearly. This is what I was thinking, depending on the language you select, the dropdown would show questions in one language or another: image image

  2. As of now you cannot change the language of a column. You can only add or remove columns (which simply represent questions). The language is always connected to whatever was the language of the main question you selected in the first screen (before the table appears). If you selected a question in English, then everything is in English. In the newest mockups I just attached, the language selection is even more obvious.

  3. Changed the column headings so it becomes more clear. Removed the language option in the '+' button menu to avoid confusions. The idea remains the same though. You can add or remove columns with this feature. The only issue becomes: It might be confusing to have the user select a question at the beginning only to be able to add more to the table as new columns, specially if the new additions are also audio questions. Maybe then the addition of columns should be restricted to specific metadata alone? image image image

Let me know what you think and I can also update the PDFs accordingly.

tinok commented 4 years ago

@Ig-Rebollo

The only issue becomes: It might be confusing to have the user select a question at the beginning only to be able to add more to the table as new columns, specially if the new additions are also audio questions. Maybe then the addition of columns should be restricted to specific metadata alone?

Depends on what you mean by metadata. Usually we only refer to metadata as columns such as submission time, start and end timestamps, today, unique ID, etc. I think the goal of the table view is to display any other columns available in the dataset. If they choose an audio or other media question then the column would display the filename, just like in the online table. At least that's my expectation.

The UX question here is whether the button is obvious enough: The + button for displaying other columns is new in this context. In the table view hiding/displaying columns is done via the settings cog. One idea would be a text button that looks similar to other column headers, e.g. "Display other data columns". This would also help since otherwise you have to click the + button in order to hide data columns.

Ig-Rebollo commented 4 years ago

@tinok

All good points, I rearranged things a bit better to incorporate your comments. Instead of adding a button with the 'display other data columns', which would be quite long and take a fair amount of valuable space, I included a new icon beside the settings. Users might need to hover over it to discover what it is, but it will be the same than 'settings' or 'expand' and users are used to go there to find it. In addition, I incorporated a 'help' icon as in other sections of the platform. Here is an animation for more clarity: NLPanimation_jul16

If you think this is already in a good enough condition to test, I'll start working on the user testing file. This will take a bit of time to create as it will be a clickable PDF with all the paths we want to test, which means making almost all the buttons clickable and dozens of 'slides' in total.

tinok commented 4 years ago
  1. One more suggestion for making the example slides better for user testing: The dummy form itself should consist of a number of audio and non audio questions. When selecting a question in the first screen, only audio/video questions would be displayed. But when choosing other columns to be displayed to the right of the table, all (or only non-audio/video) questions would be displayed. So the dropdown when adding more columns would be different.

  2. Relatedly, in the example slides when adding another column to the right you picked an audio question (showing its filenames). But the point of this feature (to me) is that a user would want to show useful additional information (gender, age, education level, employment status, etc.) from their dataset.

Ig-Rebollo commented 4 years ago

@tinok I finally put together a prototype ready for testing. This involved:

  1. Adjusting the design to match the latest release (the new KoBo version with white background and simplified colours I have been working on with @magicznyleszek);
  2. Creating the necessary paths to test the basic functions of the feature;
  3. Polish some of the elements and decisions we made in July.

Additionally, I made a draft of the instructions that will be given to interview participants to navigate through the prototype:

  1. From the project summary, find the section where you can manage translations (these screens will need to be added, currently not present)
  2. You want to transcribe and translate responses from the question asking about participant's experience.
  3. From this question, the most urgent responses to process are those of women who did not test positive but had symptoms. How do you think you can find these? (at the moment, without using the filter, though that should also be posible)
  4. Transcribe all rows matching that criteria. (this might take a bit of time! though you can now de-select the rows) 3.B. You realize one respondent marked 'other' in gender, but also didn't test positive despite having symptoms. Add it to the transcription queue.
  5. Once the first transcription is complete, review the resulting document.
  6. Make sure you check the whole document, find all timestamps.
  7. You realize the last timestamp is unnecessary. Try deleting it.
  8. Save changes and go back to the table view.
  9. All the transcriptions are now complete. Try to translate the three simultaneously into Arabic and Spanish.
  10. Select a provider (not sure if this is even needed. Can be removed or needs logos of the options) 10.Download all transcriptions and translations you have made. (This still needs to be completed).

Here is the link to the prototype: https://www.figma.com/proto/KMRRYjYUFnXCiVXup4Gx9m/NLP-Features-KoBoToolbox?node-id=11%3A324&viewport=-6780%2C365%2C0.2719959616661072&scaling=scale-down

Ig-Rebollo commented 3 years ago

@tinok I made significant updates to the prototype based on your comments and our conversation earlier this month. I am feeling pretty good about the direction this is taking, and I feel pretty confident that this is in a good enough condition to begin testing with users at a larger scale.

This GIF shows the main updates and revisions in the workflow and design. Some additional aspects of the design, actions and clicking options will have to be put into additional wireframes for testing and implementation, but for the time being this should give you a pretty good idea of how everything would work.

NLP-flow-Jan2021

This is the main view of the table. Only one button to add a transcription. Translations can be done either in bulk by selecting items, or by clicking on the 'open file' button and doing it in the text file editor. Main-audio-table

This would be the default view of the text editor: Text-file-editor

And clicking on the 'add translation' button, it would create a new file for that language (prior selecting whether you want to do it automatically or manually and the final language in a modal). Where it says 'Español' is a dropdown that allows you to move back to the original transcription view or to other translations, if available. translation-view

Let me know what you think.

tinok commented 3 years ago

Thanks @Ig-Rebollo, this looks great and works much better to get to the transcription / translation screens.

Is there a strong reason for having the original transcription on the right and not the left of the screen? Usually the transcription tools I've seen always work to have the original on the left (which surely is a bias towards LTR languages).

Ig-Rebollo commented 3 years ago

@tinok no particular reason other than my own preference and some vague standards for web design: we start reading from left to right, top to bottom, so the most important section (the area where the action happens, where you type) is usually placed on the top left. You need to check the transcript, but that is a supporting document rather than the main output of this page. Then again all of this is absolutely debatable, so I'm more than open to changing it.

When we start testing the design we can ask users for their opinion and see which option they would prefer or expect.

magicznyleszek commented 3 years ago

@tinok @Ig-Rebollo I have some questions :)

  1. What do tag button and text box(?) button do? These are the two last button next to the transcription text editor. There is also another tag button in the media player.
  2. If you are listening to the audio file, and add a tag (I assume it adds a timestamp), where does the tag go into the rich text editor?
  3. If you start transcribing manually and then want to use an automated service, does it override your work?
  4. If you modify an automated transcription (and save and exit and go back) and want to revert your changes (w/o paying again) how can you do it?
  5. What if you want to compare the outputs of multiple transcription engines?
  6. There is a download option for rich text. If I have unsaved changes, what should the download button do? Should we prompt to save changes first and only allow downloading if saved? RTF would need to be generated on backend (no reasonalbe way to do it on frontend), that is why I'm asking :)
  7. There should be a way to rename the translations created in the transcription editor
  8. What would the settings button in the media player do?

Some ideas:

Some notes:

The rich text packages that seem nice:

Ig-Rebollo commented 3 years ago

Thanks for all your comments and questions @magicznyleszek, really useful to get your perspective here. Some of the things you mentioned didn't occur to me, so this is really helpful.

Re- your questions:

  1. The buttons for the text editor are based on the app that Translators Without Borders told us they were using the most: OTranscribe. The key of that app is that you don't really click on the buttons as much as you use the shortcut on the keyboard. They are typing everything quickly, so the workflow is not so much 'type, highlight and make bold', but simply use the keyboard shortcut to activate bold, type, and then deactivate bold with the keyboard again. The way I made it in my mockups is that the buttons show you the shortcut on hover. The button for normal text is a bit unnecessary, but essentially returns you to normal text, removing any bold or italic that might have been active. The tag button, as you said, adds a timestamp.
  2. Adding timestamps is a tricky one. It might require us to have a call to talk about it. The current button adds a timestamp in the audio file, and if you have translated or transcribed it automatically, it would appear in blue dividing the correct paragraphs (as you see in the mockups). The problem is that if you transcribe or translate manually, and you add a timestamp, it would be added to that transcription file but it couldn't go into the audio file as well, since the app wouldn't have a way to know what time that is. Anyhow, something to think about for all of us and maybe worth dedicating a call to it.
  3. If you transcribe manually and then use the automated service, yes, it would override your initial transcription. However, we could offer the possibility to save it as a separate translation. In any case we would need to have a warning notification before allowing the user to continue.
  4. I did not plan for reverting changes once you save and go back to the table view. At that point you would have to do the automated transcription again. However, if this is something worth implementing I'm happy to look into it and include a button to revert changes (probably on the top header). Not sure what the technical implications would be though. We would have to make sure the initial translations are always stored somewhere...
  5. You can add a new translation for the same language. The way I thought of it, there's nothing stopping you from having 5 Spanish translations if you want to.
  6. For the download in the editor view, we could include a modal asking users to save before downloading. The problem is that at this point if you save (button on the top right corner), you save everything, not just a given translation. All files would be saved. Not sure if there are any cases in which you might want to save translation A but not translations B and C for example...
  7. This is a good point, especially if we allow multiple translations of the same language. The modal that appears when you click on 'add translation' could give you the option to rename it or add a description.
  8. Settings would include any other features we want to include as part of the audio play. For instance, I was thinking of options like slowing down (or speeding up) the pace of the recording, for ease of transcription.

Your idea of allowing further customization of the editor view (what gets displayed in each side) sounds great to me. I may have to change a few things to make it happen but I think it might be worth it.

As per the rich text packages, I'll try to change the design slightly to incorporate one of those. I particularly like the slate one, and it is actually nice to have it as part of the sheet itself, so I might try that.

cc @tinok

magicznyleszek commented 3 years ago

@tinok @Ig-Rebollo after last call about NLP I have some more thoughts:

  1. Maybe we should add an "eye" (open submission) button here to allow for easier identification of the submission?
  2. Have we considered "Transcriptions" header name instead of "Text Files"? I see that "Text Files" mirrors "Audio File", but it's not perfectly clear for me what these are
  3. Maybe instead of "open file" the button should say "open editor"?
  4. I think we should allow for only single transcription text (and multiple translations of course). As for people wanting to revert to the original text they paid for when using automated service - the easiest solution would probably be to send them the text via email when the translation is done?
  5. "see columns" might be better as "show submission columns"?
  6. I think we could drop the timestamps feature from the initial work
  7. I think we could also drop the rich text formatting from initial work
  8. Do we want to somehow restrict/suggest what the translation names should be? E.g. force users to not type in translation name but rather use only one of existing form languages?
Ig-Rebollo commented 3 years ago

Thanks for the comments @magicznyleszek very helpful to get your input as usual. I'll try to respond one by one:

  1. This is an interesting idea (if I understood correctly). So the eye would open the full submission preview or 'submission record' as it does in the table view? That would be a great way to help them identify records, can't believe I didn't think of that. Very good point!
  2. The reason for this is that --as @tinok pointed out-- some users might want to use this for things other than transcribing. For example, writing notes or a summary of the audio, or even translate directly to another language as they listen to it. Hence why I thought it might make sense to make it more generic. This is the kind of thing we will have to check with users though.
  3. Not a bad idea, but again, if all you do is to manually write notes, open file would make more sense.
  4. Definitely agree. As of now, the assumption is that there is only one transcription. I'm not even sure we have to give the option of reverting to the original. You can always download the text file to save a copy of the original before changing it. If we also allow them to upload files, they can literally revert to original on their own manually.
  5. I agree, but it is a bit long for the UI. Maybe 'show columns' or 'submission columns'?
  6. I agree it makes things overly complicated. Maybe we simplify it and you can only have timestamps when recording the audio (either manually or automatically). In the editor you see them but you cannot edit them, add new ones or change them.
  7. I think some rich text formatting would be helpful, even if very basic. Though it doesn't have to be in the first version.
  8. Definitely agree that this would be nice, but on the other hand we might not have a list of all the languages/dialects they might want to translate from manually. For instance, they might not want to just have one 'french' translation, but an 'Abidjan french', 'Yaounde french', 'Kinshasha french'....
magicznyleszek commented 3 years ago
  1. Yes, single submission modal will open like in table view :)
  2. :+1:
  3. :+1:
  4. We can wait to see if users report actual problems with that. Maybe a better feature would be versioning the text (in future) :)
  5. "show columns" has action in it, would be best :)
  6. I have a new idea now! Instead of adding timestams inside the text we could make a narrow column on the left of the text (similarly to the line numbers in a code editor) and allow for setting one timestamp per line. This should make it easier to implement it. Still the whole timestamps feature is easy to be split from initial work :)
  7. We would be using a rtf plugin, so it would be either all features or none :) I.e. having just bold text would mean that we would have other functions automatically too
  8. Mhm, I was also thinking that maybe somene has form in English only, but wants to use the translation feature - restricting to only form languages would make it impossible