Background
Rezonator users may want to import data produced in popular software such as Elan. Elan is widely used by linguists, anthropologists, and others, especially for transcribing audio and video recordings of conversation. A useful workflow is to:
use Elan to transcribe the conversation
use Rezonator to analyze the conversation.
So we need a strategy to get data from Elan into Rezonator in the easiest possible way.
The solution you'd like
To import data from Elan, do the data exchange in two steps:
First, use Elan to export the file , using a commonly used file format, such as a tab-delimited file.
open the Elan transcription file (.eaf file)
From the menu, select "File/Export as/Tab-delimited text".
Select the appropriate export options, checking the boxes as shown in the screenshot below.
Second, use Rezonator to import the tab-delimited text. (See next section for details.)
Screenshot
The following shows how to select options for the "Export as tab-delimited text" option in Elan:
Import into Rezonator
Now use Rezonator to import the tab-delimited file.
Rezonator then creates its own internal version of the file, which more or less clones (closely mimics) the file structure of the tab-delimited file (inheriting most aspects of its data structure from the original Elan file).
Within Rezonator, it is important to correctly handle the fields of data commonly encoded in Elan. Each field should be assigned to the correct field in the Rezonator node map. This will require mapping into Rezonator fields from the fields labeled by the Elan conventions, such as:
timestamps (e.g. begin time, end time, total length)
participants (speaker labels)
The imported data needs to be tokenized in the usual way.
The text field should contain all the tokens (e.g. morphemes, words, pauses, vocalisms, etc.).
By default, Elan groups utterances by the participant who produced it, not by the conversational sequence. For a good result, Rezonator should use the time-stamp information to sort the utterances into the original conversational sequence. (Sort by unitStartTime, then by unitEndTime.)
For more complex Elan transcriptions, this may involve fields such as text, transcription, gloss, translation, etc. The Rezonator import screen should allow users to specify the mapping between Elan field names and the corresponding Rezonator field names.
Documenting the Elan export
One goal is to simply document the process of exporting from Elan. Even if the Elan documentation already describes how to export a tab-delimited file, Rezonator users will benefit from us documenting the simplest way possible to export from Elan, and import into Rezonator.
For general information on Elan and the .eaf format, see the documentation on Elan
Alternatives you've considered
It may be possible for Rezonator to import an Elan file (.eaf) directly. This would require a schema to interpret and process the .eaf format files used by Elan. The question is whether this would be cost-effective.
Evaluate whether it makes sense to:
use the existing Elan export functions to create an exchange file format (as described above), or
import an Elan file (.eaf) directly, using a schema to interpret and process the .eaf format files used by Elan
Background Rezonator users may want to import data produced in popular software such as Elan. Elan is widely used by linguists, anthropologists, and others, especially for transcribing audio and video recordings of conversation. A useful workflow is to:
The solution you'd like To import data from Elan, do the data exchange in two steps:
Screenshot The following shows how to select options for the "Export as tab-delimited text" option in Elan:
Import into Rezonator
Documenting the Elan export
Exporting a document as a tab-delimited text file.
Alternatives you've considered It may be possible for Rezonator to import an Elan file (.eaf) directly. This would require a schema to interpret and process the .eaf format files used by Elan. The question is whether this would be cost-effective.