xanguera / BeamformIt

BeamformIt acoustic beamforming software
347 stars 111 forks source link

BeamformIt

acoustic beamforming tool

BeamformIt is an acoustic beamforming tool that accepts a variable amount of input channels and computes an output via a filter&sum beamforming technique. It makes almost no assumptions on the input data (e.g. number of channels, topology, locations, individual channel audio quality, ...).

BeamformIt was originally implemented by Xavier Anguera at ICSI for participation to the NIST RT05s Meetings evaluation to deal with the different number of microphone channels available in a meeting room. BeamformIt was then rewritten and improved for the RT06s evaluation and finally readjusted and documented for public release.

BeamformIt was initially focused towards processing the data used in the RT evaluations but it can now process all sorts of data. As of version 3.5 an effort has been made to eliminate almost all external library dependencies and a script has been created for "casual" beamforming users to have an easy-to-use tool for their acoustic beamforming needs.

Prior to its release into Github, BeamformIt was released via this website with a versioning system. As of version 3.51 all releases will be done through Github. Following Kaldi's philosofy, we will not set new versions (but will try to keep a log of key changes below).

If you use the software for research I would very much appreciate if you could cite my work, you can use any of the following citations: "Acoustic beamforming for speaker diarization of meetings", Xavier Anguera, Chuck Wooters and Javier Hernando, IEEE Transactions on Audio, Speech and Language Processing, September 2007, volume 15, number 7, pp.2011-2023. "Robust Speaker Diarization for Meetings", Xavier Anguera, PhD Thesis, UPC Barcelona, 2006.

Index

  1. Compiling the code
  2. Running the tool
  3. Output files
  4. How to cite
  5. Change Log

Compiling the code

As of version 3.5 there is only ONE external library required by BeamformIt:

In Ubuntu, you can install libsndfile sudo apt-get install libsndfile1-dev

Additionally, doxygen can be used to compile the documentation of the source-code

To compile the code, first of all, you need to make sure that the Makefile is pointing to the right directories for the sndfile library (or that it has been installed in the system). The program uses cmake to compile the code. This means we can usually compile with: cmake . make

Note: if sndfile is not installed in the system (and you do not want to do it) and the compilation complains you can repeate the cmake command as follows: cmake -DLIBSND_INSTALL_DIR=/libsnd/install/dir .

You can also type make clean to clean up any executable or .o files left from previous compilations. To compile the code documentation execute make documentation, this should create a doxigen documentation structure under docs.

Running the tool

This section describes how to run the system. There is a VERY simple way and a more complicated way. The simple way is oriented towards casual users of the tool. The more complicated way is part of the legacy code that speech processing people have been using for beamforming audio signals.

simple (but limited) way

Use the script do_beamforming.sh provided in the base directory to apply BeamformIt to all audio files within a certain directory. The script takes 2 parameters:

more complicated way

This way allows for the use of a single channels file to process multiple file configurations and to have all audio files for different configurations in the same directory. This is the way that speech researchers have been using BeamformIt until now (with the difference that some config parameters might have now changed)

To explain the way to run it we follow an example based on the RT06s NIST evaluation for meetings (conference room data).

Prerequisites: To run the beamforming we need to have the input files in .sph (sphere) or .wav format, containing one or more channels per file (having a set of files with mixed number of channels is fine). Sometimes a preprocesing is performed to the data prior to beamforming. A usual preprocessing step which has given good results at ICSI is Wiener filtering each individual channel.

Config files: There are 2 files that need editing before running the beamforming. For this example they are a config file cfg-files/RT06s_conf.cfg and a channels file cfg-files/channels. The config file determines the location of all data and the running parameters, as well as the location of the channels file. The channels file contains information of which individual audio files will be combined into a single output file. The config file is the only mandatory parameter to the executable with -C .

Let us see in detail the format and possible parameters in each case:

Parameters in bold are necessary for the system to run and without them it will not start. The letters in parenthesis indicate short ways to refer to the same parameter when used as a command line argument, otherwise you need to use --parameter_name.

Output files

After the system runs (and dumps into stdout a lot of stuff...) an output directory is created in result_dir with the same name as show_id (in here NIST_20051024-0930 as an example) and with some/all of the files inside (depending on the config parameters being used):

Known limitations

The BeamformIt software has been tested for up to 128 channels in parallel, but in theory it can hold as many as memory can manage where you run it. To increase the number of channels you just need to change the variable MAXNUMCH in src/global.h

The current version is independent on the number of channels per input file, the framerate and its resolution. If the input files defined in the "channels" file contain more than 1 channel, an algorithm is executed internally to split the data into individual files, storing them into the output directory in WAV format. All these files are kept in the directory after the execution, it is the user's choice to delete them if disk space is a constraint.

How to Cite

If you use this software in your research please reference it as:

"Acoustic beamforming for speaker diarization of meetings", Xavier Anguera, Chuck Wooters and Javier Hernando, IEEE Transactions on Audio, Speech and Language Processing, September 2007, volume 15, number 7, pp.2011-2023.

  author = {X. Anguera and C. Wooters and J. Hernando},
  title = {Acoustic beamforming for speaker diarization of meetings},
  journal = {IEEE Transactions on Audio, Speech, and Language Processing},
  year = {2007},
  volume = {15},
  pages = {2011-2021},
  number = {7},
  month = {September}
}

And also a (very exhaustive) description of the system, with speaker diarization experiments,
in my PhD thesis:

Xavier Anguera, "Robust Speaker Diarization for Meetings", PhD thesis, Technical University of Catalonia, 2006

  author = {Xavier Anguera},
  title = {{PhD Thesis: Robust Speaker Diarization for Meetings}},
  school = {Universitat Politecnica de Catalonia},
  year = {2006}
}

Change log

This contains the main changes in the tool. This replaces the versioning system we used until version 3.51. For a more detailed changelog refer to the Github commit logs.