saulalbert / CABNC

CABNC
https://saulalbert.github.io/CABNC/
4 stars 2 forks source link

Enhance bnc_xml files with links to audio data #3

Closed saulalbert closed 9 years ago

saulalbert commented 9 years ago

This is reasonably straight forward: use the http://bnc.phon.ox.ac.uk/filelist-textgrid.txt files (in the absence of other AudioBNC metadata) as a starting point for a series of commands that can enrich the bnc_xml files with links to audio data for our xslt transformation.

For example, the line in that index text file pointing to this TextGrid file:

http://bnc.phon.ox.ac.uk/data/021A-C0897X0020XX-AAZZP0_002002_KBK_2.TextGrid

Tells us that the tape with the ID 002002 in the file KBK.xml is associated with the audio file: http://bnc.phon.ox.ac.uk/data/021A-C0897X0020XX-AAZZP0.wav

So it should be a trivial series of search/replace commands to turn http://bnc.phon.ox.ac.uk/filelist-textgrid.txt into a set of simple sed commands to modify the bnc_xml files.

This would work pretty straight forwardly:

KBK.xml contains one instance of a recording ID line:

<recording xml:id="KBKRE001" n="002002" date="1991-05-31" time="08:10" type="Walkman"/>

A simple regular expression such as s/n="002002" date/n="02002" audio="021A-C0897X0020XX-AAZZP0" date/ would yield a new line:

<recording xml:id="KBKRE001" n="002002" audio="021A-C0897X0020XX-AAZZP0" date="1991-05-31" time="08:10" file=" type="Walkman"/>

Which would give us the ability to add an audio header (either to a local or remote file) in the .cha file.