A new steps/data/reverberate_data_dir.py script

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

http://kaldi-asr.org

Other

14.08k stars 5.31k forks source link

A new steps/data/reverberate_data_dir.py script #552

Closed vijayaditya closed 8 years ago

vijayaditya commented 8 years ago

The current RIR preparation scripts in ASpIRE recipe are too convoluted. This is leading to multiple hard to trace bugs. I have to prepare a simple RIR preparation script.

Prepare an upload for openslr.org including all the RIRs that can be freely distributed. Separate uploads for RIRs which can be distributed for research and commercial use is preferable.

vimalmanohar commented 8 years ago

I use the RIR preparation + additive noise for VAD. It would be good to have a common approach for both purposes.

vijayaditya commented 8 years ago

@vimalmanohar Could you specify in detail what exactly you would like to do for the additive noise part? The RIR preparation script is going to be a bit complex as it deals with data from 9 different databases. There is a lot code on trying to convert this into a common format.

It would also have a script which packages these RIRs for consumption in other recipes. This is for upload to openslr.org.

Are your noise databases from different sources and freely distributable ?

vimalmanohar commented 8 years ago

I use noises only from the musan corpus and the ones in the 9 databases. After the preparation of the databases, I only need the following things:

A list of <impulses, noises> pairs -- the same as that currently used in the aspire recipe
A list of all background (stationary) noises -- I am assuming the noises from all the 9 databases are background. MUSAN has a README file which specifies which files are background noises.
A list of all foreground noises -- These are currently from MUSAN. I can make the script for MUSAN preparation to be similar to the other scripts you make.

vijayaditya commented 8 years ago

@vimalmanohar We should refocus on this effort.

@tomkocse This effort is very similar to your current effort on adding simulated reverberation to data. It would be very helpful if you are able to make your script generic enough to satisfy requirements of a general data perturbation recipe.


output_dir=
steps/data/prepare_rirs_noises.sh \
--simulated-rir-option-string " <option-string-simulated-rir-generation-script>" \
<rir-database-list> <noise-database-list> $output_dir
steps/data/reverberate_data_dir.py --rir-list $output_dir/info/rir_list \
                       --noise-list $output_dir/info/noise_list \
                       --num-replications 10 \
                       <input-dir> <output-dir>

The RIR (and noise ?) databases would be mirrored at open-slr.org. RIR list would have the format


<rir_id> [<rir-type (simulated, real)>] < location(support Kaldi IO strings) >

Noise-list would be of the format


<noise_id> <noise-type (isotropic, point source)> [<rir-file>] < location(support Kaldi IO strings) >

The third argument in the list is optional and specifies any dependency between the noise and rir files. This is useful in the databases like RWCP where isotropic noise recordings are available for a given room and microphone position. Point source noises would include noises like those available in NOISEX database. These would have to be reverberated along with the speech signal, with an RIR from the same room but different position, before adding them to the reverberated speech signal.

Given that the corruption with point source noises would require sampling an RIR from the same room as the RIR sampled for the speech signal, I think it would make our life very simple if the reverberate_data_dir script is written in python, in a fashion similar to steps/nnet3/train_dnn.py i.e., making use of steps/nnet3/nnet3_train_lib.py's RunKaldiCommand function.

I think the following distribution of work would make sense, depending on things we are already working on.

@vimalmanohar ,@vpeddinti - steps/data/prepare_rirs_noises.sh would involve proper data pre-processing steps and ensuring all openly available data is mirrored at openslr.org @tomkocse - steps/data/reverberate_data_dir.py, simulated RIR generation script (already completed by Tom)

Let me know if this is acceptable.

danpovey commented 8 years ago

It certainly looks reasonable to me.

On Mon, Apr 4, 2016 at 6:48 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

@vimalmanohar https://github.com/vimalmanohar We should refocus on this effort.

@tomkocse https://github.com/tomkocse This effort is very similar to your current effort on adding simulated reverberation to data. It would be very helpful if you are able to make your script generic enough to satisfy requirements of a general data perturbation recipe.

output_dir= steps/data/prepare_rirs_noises.sh \ --simulated-rir-option-string " " \
$output_dir steps/data/reverberate_data_dir.py --rir-list $output_dir/info/rir_list \ --noise-list $output_dir/info/noise_list \ --num-replications 10 \ The RIR (and noise ?) databases would be mirrored at open-slr.org. RIR list would have the format [] < location(support Kaldi IO strings) > Noise-list would be of the format [] < location(support Kaldi IO strings) > The third argument in the list is optional and specifies any dependency between the noise and rir files. This is useful in the databases like RWCP where isotropic noise recordings are available for a given room and microphone position. Point source noises would include noises like those available in NOISEX database. These would have to be reverberated along with the speech signal, with an RIR from the same room but different position, before adding them to the reverberated speech signal. Given that the corruption with point source noises would require sampling an RIR from the same room as the RIR sampled for the speech signal, I think it would make our life very simple if the reverberate_data_dir script is written in python, in a fashion similar to steps/nnet3/train_dnn.py i.e., making use of steps/nnet3/nnet3_train_lib.py's RunKaldiCommand function. I think the following distribution of work would make sense, depending on things we are already working on. @vimalmanohar https://github.com/vimalmanohar ,@vpeddinti https://github.com/vpeddinti - steps/data/prepare_rirs_noises.sh would involve proper data pre-processing steps and ensuring all openly available data is mirrored at openslr.org @tomkocse https://github.com/tomkocse - steps/data/reverberate_data_dir.py, simulated RIR generation script (already completed by Tom) Let me know if this is acceptable. — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-205528264

vijayaditya commented 8 years ago

In interest of having better clarity, I am detailing the specific changes compared to reverberate_data_dir.sh, which already exists.

The fundamental component of this script is the binary wav-reverberate which needs to support new functionality.
- It will now support an additional mode convolve(speech, rir_speech) + \sum_i convolve(point_source_noise_i, rir_noise_i) + isotropic_noise . Previously it just supported convolve(speech, rir_speech) + isotropic_noise. This new change is necessary to support data corruption for the VAD system which is making extensive use of point source noises.
The new steps/data/reverberate_data_dir.py has to now support generation of the new wav-reverberate commands with multiple point source noises (foreground, background, etc.,) and isotropic noises. For this it would have to do the following
- sample RIR from the RIR list, check the room-id of this RIR (rir_list should specify this along with each RIR. This should be taken care of by the prepare_rir_noises.sh script)
- If point-source noises are specified, sample one RIR each from the same room, and create a wav-reverberate command specifying these. A sample command looks like this
```
 wav-reverberate --multi-channel-output=false \
         --rir-channel 1 \
          --point-source=noise1.wav --rir=rir1.wav --snr-db=10\
          --point-source=noise2.wav --rir=rir2.wav --snr-db=5 \
          --isotropic-noise=noise3.wav --snr-db=20 \
          --normalize-output=true \
           speech_rir.wav speech.wav 
```
  I will discuss the exact details of the code when the implementation reaches that point.
- There will be extensive checking by the python script to ensure that it is pairing the correct isotropic noise with the correct rir file (based on room ids); and the room-ids of the rir files for speech and point sources are the same. (Note: We do not anticipate the availability of a lot of real RIRs from the same room, but we will be generating a lot of simulated RIRs from the same room).

@tomkocse let us know if you need help with the wav-reverberate changes.

danpovey commented 8 years ago

Thanks..

In interest of having better clarity, I am detailing the specific changes compared to reverberate_data_dir.sh, which already exists.

-

The fundamental component of this script is the binary wav-reverberate which needs to support new functionality.

It will now support an additional mode convolve(speech, rir_speech)

\sum_i convolve(point_source_noise_i, rir_noise_i) + isotropic_noise . Previously it just supported convolve(speech, rir_speech) + isotropic_noise. This new change is necessary to support data corruption for the VAD system which is making extensive use of point source noises.

The new steps/data/reverberate_data_dir.py has to now support generation of the new wav-reverberate commands with multiple point source noises (foreground, background, etc.,) and isotropic noises. For this it would have to do the following

sample RIR from the RIR list, check the room-id of this RIR (rir_list should specify this along with each RIR. This should be taken care of by the prepare_rir_noises.sh script)

If point-source noises are to specified sample one RIR each from the same room, and create a wav-reverberate command specifying these. A sample command looks like this

wav-reverberate --multi-channel-output=false \ --rir-channel 1 \ --point-source=noise1.wav --rir=rir1.wav --snr-db=10\ --point-source=noise2.wav --rir=rir2.wav --snr-db=5 \ --isotropic-noise=noise3.wav --snr-db=20 \ --normalize-output=true \ speech_rir.wav speech.wav

The Kaldi argument-parsing is of the key=value type, repeated versions of the same argument will have no effect. I wonder if it would be possible to accomplish the same thing by chaining together the same command, or maybe a different command? Actually I don't really understand this command and what it's supposed to do; and I think it would be easier to have a relatively simple command and chain it together.

-

I will discuss the exact details of the code when the implementation reaches that point.

There will be extensive checking by the python script to ensure that it is pairing the correct isotropic noise with the correct rir file (based on room ids); and the room-ids of the rir files for speech and point sources are the same. (Note: We do not anticipate the availability of a lot of real RIRs from the same room, but we will be generating a lot of simulated RIRs from the same room).

@tomkocse https://github.com/tomkocse let us know if you need help with the wav-reverberate changes.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-205622050

vijayaditya commented 8 years ago

OK. In that case we can have a command like the one below to accomplish the same thing as above


wav-reverberate  --multi-channel-output=false \
                            --point-source-noise-rir=rir_1.wav --point-source-noise=noise_1.wav --snr-db=10\
                            --isotropic-noise=noise3.wav --snr-db=20 \
                            --speech-rir speech-rir.wav speech.wav - | \
wav-reverberate --point-source-noise-rir=rir_2.wav --point-source-noise=noise_2.wav --snr-db=5 \
                           --normalize-output=true \
                           - output.wav

The equation being implemented is output = convolve(speech, speech_rir) + \sum_i scale_iconvolve(noise_i, rir_i) + scale_isotropicisotropic_noise

danpovey commented 8 years ago

OK. You still have two --snr-db options in the 1st file. I think it might be easier to just use another copy of wav-reverberate to reverberate noise_1.wav with rir_1.wav (maybe with options to set the length equal to the sum of the lengths?), and then just add it into the waveform, again using wav-reverberate or some other differently-named tool. Then it will be clearer what is going on, and easier to write the program.

Dan

On Mon, Apr 4, 2016 at 11:36 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

OK. In that case we can have a command like the one below to accomplish the same thing as above

wav-reverberate --multi-channel-output=false \ --point-source-noise-rir=rir_1.wav --point-source-noise=noise_1.wav --snr-db=10\ --isotropic-noise=noise3.wav --snr-db=20 \ --speech-rir speech-rir.wav speech.wav - | \ wav-reverberate --point-source-noise-rir=rir_2.wav --point-source-noise=noise_2.wav --snr-db=5 \ --normalize-output=true \

output.wav

The equation being implemented is output = convolve(speech, speech_rir) + \sum_i scale_i_convolve(noise_i, rir_i) + scale_isotropic_isotropic_noise

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-205625736

danpovey commented 8 years ago

... and since so many copies of programs will be selecting individual channels from wav files, I think it would make sense to have a script copy the individual channels to disk so that we don't incur a bunch of wasted I/O. That way, when specifying RIRs and noises, there would be no need to specify the channel.

On Mon, Apr 4, 2016 at 11:42 PM, Daniel Povey dpovey@gmail.com wrote:

OK. You still have two --snr-db options in the 1st file. I think it might be easier to just use another copy of wav-reverberate to reverberate noise_1.wav with rir_1.wav (maybe with options to set the length equal to the sum of the lengths?), and then just add it into the waveform, again using wav-reverberate or some other differently-named tool. Then it will be clearer what is going on, and easier to write the program.

Dan

On Mon, Apr 4, 2016 at 11:36 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

OK. In that case we can have a command like the one below to accomplish the same thing as above

wav-reverberate --multi-channel-output=false \ --point-source-noise-rir=rir_1.wav --point-source-noise=noise_1.wav --snr-db=10\ --isotropic-noise=noise3.wav --snr-db=20 \ --speech-rir speech-rir.wav speech.wav - | \ wav-reverberate --point-source-noise-rir=rir_2.wav --point-source-noise=noise_2.wav --snr-db=5 \ --normalize-output=true \

output.wav

The equation being implemented is output = convolve(speech, speech_rir) + \sum_i scale_i_convolve(noise_i, rir_i) + scale_isotropic_isotropic_noise

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-205625736

vijayaditya commented 8 years ago

oops my bad.. yes we can one wav-reverberate command for each noise source. OK, we can store rir-channels separately in the preprocessing stage.

vijayaditya commented 8 years ago

a new wav.scp would be created by the reverberate_data_dir.py script with the wav-reverberate commands. However I am worried that we would be wasting a lot computation to convolve rirs with the entire speech file (or noise files extended to the length of speech file) just to extract a 2 second segment from this convolved output when using compute-mfcc-feats. To avoid this wastage I previously suggested having an extract-reverb-segments binary which will just convolve with a speech signal segment necessary to generate a reverberated segment which is the proper convolution output. This felt like over complicated code before as we were just performing one convolution in the previous wav-reverberate, but now that we might have 2 convolutions on average per command (one for speech and one for point source noise) this might be significantly more effecient.

danpovey commented 8 years ago

Maybe you can configure it so that you consume a reasonable proportion of the file in compute-mfcc-feats, not just a small segment. Presumably you'd sort the segments in such a way that it would read a few times from the same input pipe. Dan

On Mon, Apr 4, 2016 at 11:54 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

a new wav.scp would be created by the reverberate_data_dir.py script with the wav-reverberate commands. However I am worried that we would be wasting a lot computation to convolve rirs with the entire speech file (or noise files extended to the length of speech file) just to extract a 2 second segment from this convolved output when using compute-mfcc-feats. To avoid this wastage I previously suggested have an extract-reverb-segments which will just convolve with a speech signal segment necessary to generate a reverberated segment which is the proper convolution output. This felt like over complicated code before as we were just performing one convolution in the previous wav-reverberate, but now that we might have 2 convolutions on average per command (one for speech and one for point source noise) this might be significantly more effecient.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-205630329

tomkocse commented 8 years ago

@vijayaditya , there is no room-id specified in the rir list format: <rir_id> [<rir-type (simulated, real)>] < location(support Kaldi IO strings) >

is "rir_id" means room-id ?

vijayaditya commented 8 years ago

The work on data prep scripts has not yet been done. So assume what you need for the reverberate_data_dir.py to work and script. Also let us know on this list what you are assuming so that we can comment on it.

Vijay On Apr 5, 2016 21:34, "tomkocse" notifications@github.com wrote:

@vijayaditya https://github.com/vijayaditya , there is no room-id specified in the rir list format: [] < location(support Kaldi IO strings) > is means room-id ?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-206066624

tomkocse commented 8 years ago

@vijayaditya , To be more accurate, when point source is specified, except sampling an rir from the same room, it should have the same receiver position, right ?

vijayaditya commented 8 years ago

Yes. We will assume that the microphone position is the same for both the noise source and the speech source.

--Vijay

On Tue, Apr 5, 2016 at 11:45 PM, tomkocse notifications@github.com wrote:

@vijayaditya https://github.com/vijayaditya , To be more accurate, when point source is specified, except sampling an rir from the same room, it should have the same receiver position, right ?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-206107043

vimalmanohar commented 8 years ago

@tomkocse Please add an additional field to the noise info list. The RIR info list does not require any change. <noise_id> <noise-type (isotropic, point source)> <background|foreground> [<rir-file>] < location(support Kaldi IO strings) >

@vijayaditya If you think we could write this information in a better way, let us know. I am thinking we could write all this as name=value pairs. <noise_id> <noise-type=(isotropic, point source)> <bg-fg-type=(background|foreground)> [<rir-file=rir-file>] < location=(support Kaldi IO strings) >

vijayaditya commented 8 years ago

@tomkocse use python's argparse module as you did for parsing the CNN parameters.

RIR list format --rir-id <string,compulsary> --room-id <string,compulsary> --receiver-position-id <string,optional> --source-position-id <string,optional> --rt-60 < <float,optional> --drr <float, optional> < location(support Kaldi IO strings) > @tomkocse add any other metadata you want. Not all meta-data is mandatory, decide what you want for your experiments.

--noise-id <string,compulsary> --noise-type <choices = (isotropic, point source),compulsary> --bg-fg-type <choices=(background|foreground), default=background> --rir-file <str, compulsary if isotropic, should not be specified if point-source>] < location=(support Kaldi IO strings) >

tomkocse commented 8 years ago

@vijayaditya I still don't clearly understand the mechanism of picking the noises to corrupt a particular wav. To my understanding the mechanism before is like: for each wav in wav.scp randomly pick a rir entry from rir_list randomly pick a noise entry from noise_list generate the wav-reverberate command

My question is how can i decide when to look for multiple noises to corrupt a wav at the same time. In your previous example, you might want to use two point source noises and an isotropic noise in the command at the same time.

vimalmanohar commented 8 years ago

You could do something like this: For each wav in wav.scp:

With probability p1, choose to pick an RIR randomly.
1. If you chose to pick an RIR, add the corresponding isotropic noise if it exists
With probability p2, choose to pick an RIR randomly from the same room or any RIR randomly if you did not pick an RIR in step 1.
1. Randomly pick any point source background noise and convolve with the RIR picked (if any) in step 2.
2. Repeat step 2 for RandInt(max_background_repeat) times but keep choosing RIRs from the same room.
With probability p3, choose to pick an RIR randomly from the same room as in step 2.
1. Pick a random integer between 0 and max_foreground_noises and randomly pick those foreground noises and convolve them with the picked RIR (if any) in step 3.
2. Repeat 3 for RandInt(max_foreground_repeat)

danpovey commented 8 years ago

BTW, I assume at some point you have to pick the time of the point source background noise (i.e. not have them all start at the start of the file, if the length may be less than the audio file)... or are they all longer, or do you repeat them?

On Tue, Apr 12, 2016 at 11:25 AM, Vimal Manohar notifications@github.com wrote:

You could do something like this: For each wav in wav.scp:

With probability p1, choose to pick an RIR randomly.

If you chose to pick an RIR, add the corresponding isotropic noise if it exists

With probability p2, choose to pick an RIR randomly from the same room or any RIR randomly if you did not pick an RIR in step 1.

Randomly pick any point source background noise and convolve with the RIR picked (if any) in step 2.

Repeat step 2 for RandInt(max_background_repeat) times but keep choosing RIRs from the same room.

With probability p3, choose to pick an RIR randomly from the same room as in step 2.

Pick a random integer between 0 and max_foreground_noises and randomly pick those foreground noises and convolve them with the picked RIR (if any) in step 3.

Repeat 3 for RandInt(max_foreground_repeat)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209041930

vimalmanohar commented 8 years ago

I think the background noises are stationary noises and they are long. So they can be cut out randomly if longer or repeated if its shorter than the speech file. The foreground noises are non-stationary and typically short. So we need to pick the start time of each foreground noise to be added to the file.

vijayaditya commented 8 years ago

We were previously repeating the noise files to match the speech recording length. But starting them arbitrarily would be better.

On Tue, Apr 12, 2016 at 2:27 PM, Daniel Povey notifications@github.com wrote:

BTW, I assume at some point you have to pick the time of the point source background noise (i.e. not have them all start at the start of the file, if the length may be less than the audio file)... or are they all longer, or do you repeat them?

On Tue, Apr 12, 2016 at 11:25 AM, Vimal Manohar notifications@github.com wrote:

You could do something like this: For each wav in wav.scp:

With probability p1, choose to pick an RIR randomly.

If you chose to pick an RIR, add the corresponding isotropic noise if it exists

With probability p2, choose to pick an RIR randomly from the same room or any RIR randomly if you did not pick an RIR in step 1.

Randomly pick any point source background noise and convolve with the RIR picked (if any) in step 2.

Repeat step 2 for RandInt(max_background_repeat) times but keep choosing RIRs from the same room.

With probability p3, choose to pick an RIR randomly from the same room as in step 2.

Pick a random integer between 0 and max_foreground_noises and randomly pick those foreground noises and convolve them with the picked RIR (if any) in step 3.

Repeat 3 for RandInt(max_foreground_repeat)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209041930

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209042596

vimalmanohar commented 8 years ago

The distinction of foreground vs background can be removed if the noise are added at random times. For each wav in wav.scp:

With probability p1, choose to pick an RIR randomly.
1. If you chose to pick an RIR, add the corresponding isotropic noise if it exists
With probability p2, choose to pick an RIR randomly from the same room or any RIR randomly if you did not pick an RIR in step 1.
1. Randomly pick any point source noise and convolve with the RIR picked (if any) in step 2.
2. Randomly pick a start time in the file to add the point source noise.
3. Randomly pick an SNR level from the snr_list.
4. Repeat step 2 for RandInt(max_noises_added) times but keep choosing RIRs from the same room.

vijayaditya commented 8 years ago

If you actually have clearly marked background noises it would be better to make use of this metadata. So choosing to replicate background noises makes sense to me. But we would have to be careful to ensure that the durations of the speech and noise files are comparable to avoid cases where the noise gets repeated a lot which is not helpful for ensuring variety in training data.

So the logic of selecting noises would have to be bit more complicated. We might want to choose background multiple noise files which start one after the other rather than repeating the existing noise file like we do now.

For foreground non-stationary noise files it definitely does not make sense to repeat them, so we should add these at regular intervals. We could actually let the user specify an option which determines the rate parameter of a poisson process which determines when the non-stationary noises are added.

These changes do slightly complicate the reverberate.py but the only change required in wav-reverberate.cc is ability to add noises between the specific start and end points. If the noise duration is less than this we repeat the noise. The python script will generate the start and end points.

--Vijay

On Tue, Apr 12, 2016 at 2:48 PM, Vimal Manohar notifications@github.com wrote:

The distinction of foreground vs background can be removed if the noise are added at random times. For each wav in wav.scp:

With probability p1, choose to pick an RIR randomly.

If you chose to pick an RIR, add the corresponding isotropic noise if it exists

With probability p2, choose to pick an RIR randomly from the same room or any RIR randomly if you did not pick an RIR in step 1.

Randomly pick any point source noise and convolve with the RIR picked (if any) in step 2.

Randomly pick a start time in the file to add the point source noise.

Randomly pick an SNR level from the snr_list.

Repeat step 2 for RandInt(max_noises_added) times but keep choosing RIRs from the same room.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209050364

vimalmanohar commented 8 years ago

I assumed the noises that were part of the RIR databases were all stationary background noise and David hand-picked some stationary background noises. So these are mostly accurate. However there may be background noise which are incorrectly marked as foreground. But since the foreground noises are not repeated, this would not be too much of an issue.

Adding at intervals is fine. But overlapping noises should also be allowed.

vijayaditya commented 8 years ago

Could you write the command line format for wav-reverberate.cc so that Tom (if he wants to implement it) can use it as a specification.

(Remember to have just one point-source noise per command.)

On Tue, Apr 12, 2016 at 3:49 PM, Vimal Manohar notifications@github.com wrote:

I assumed the noises that were part of the RIR databases were all stationary background noise and David hand-picked some stationary background noises. So these are mostly accurate. However there may be background noise which are incorrectly marked as foreground. But since the foreground noises are not repeated, this would not be too much of an issue.

Adding at intervals is fine. But overlapping noises should also be allowed.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209076321

tomkocse commented 8 years ago

@vimalmanohar what is "the same room" referring to? I don't see anywhere before step 2 has defined the room.

vijayaditya commented 8 years ago

This would be specified in the RIR list. Each RIR would be associated with a room, speaker position and microphone position.

On Tue, Apr 12, 2016 at 11:10 PM, tomkocse notifications@github.com wrote:

@vimalmanohar https://github.com/vimalmanohar what is "the same room" referring to? I don't see anywhere before step 2 has defined the room.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209206887

tomkocse commented 8 years ago

But if no RIR was picked in step 1, there will be no room to be referred to in step 2.

vijayaditya commented 8 years ago

Yes, in that case you have to just maintain the same room and microphone position for all the noise sources being added in step 2.

On Tue, Apr 12, 2016 at 11:28 PM, tomkocse notifications@github.com wrote:

But if no RIR was picked in step 1, there will be no room to be referred to in step 2.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209210625

vimalmanohar commented 8 years ago

The command line format for wav-reverberate would be wav-reverberate --rir-file=<rir-wav> --isotropic-noise=<isotropic-noise-wav> --isotropic-snr=<isotropic-snr-db> --point-source-noise-rir=<point-source-rir-wav> --point-source-noise=<point-source-noise-wav> --point-source-snr=<point-source-snr-db> --start-time=<s> --end-time=<e> --original-input=<original-input> <input> <output>

This does output = convolve(input, rir-wav) + a * isotropic-noise + b * convolve(point-source-noise, point-source-rir) where a and b are chosen as required to get the correct SNR. The SNR is with respect to the original-input, if it is specified. If original-input is not specified, the SNR can be taken to be wrt to the input. The point source noise is time-shifted and repeated or truncated to be within the specified start and end times.

tomkocse commented 8 years ago

@vimalmanohar can you write the command line when multiple point-sources are used? And what is the difference between original-input and input ?

tomkocse commented 8 years ago

How to source the length of the speech wav to avoid --start-time and --end-time being out of the range ?

vijayaditya commented 8 years ago

See how duration of computed in wav-tu-dur binary.

Vijay On Apr 14, 2016 02:22, "tomkocse" notifications@github.com wrote:

How to source the length of the speech wav to avoid --start-time and --end-time being out of the range ?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-209778912

vimalmanohar commented 8 years ago

wav-reverberate --rir-file=<rir-wav> --isotropic-noise=<isotropic-noise-wav> --isotropic-snr=<isotropic-snr-db> --point-source-noise-rir=<point-source1-rir-wav> --point-source-noise=<point-source1-noise-wav> --point-source-snr=<point-source1-snr-db> --start-time=<s1> --end-time=<e1> <input> - | wav-reverberate --point-source-noise-rir=<point-source2-rir-wav> --point-source-noise=<point-source2-noise-wav> --point-source-snr=<point-source2-snr-db> --start-time=<s2> --end-time=<e2> --original-input=<input> - - | wav-reverberate --point-source-noise-rir=<point-source3-rir-wav> --point-source-noise=<point-source3-noise-wav> --point-source-snr=<point-source3-snr-db> --start-time=<s3> --end-time=<e3> --original-input=<input> - - | and so on...

--original-input is there so that you can compute SNR for the multiple point sources with respect to the original unperturbed input.

danpovey commented 8 years ago

Do you really need end times as well as start times for the point source noises? Also I don't think the --rir-file option should end in -file. Might it be easier to supply multiple point source noises to the same program somehow, e.g. via a comma separated list or some other mechanism? Is there a reason why the RIRs and original wavs for all the noises need to be supplied separately, instead of having a separate program create them? It just seems complicated-- but up to you.

Dan

On Thu, Apr 14, 2016 at 8:43 AM, Vimal Manohar notifications@github.com wrote:

wav-reverberate --rir-file= --isotropic-noise= --isotropic-snr= --point-source-noise-rir= --point-source-noise= --point-source-snr= --start-time= --end-time= - | wav-reverberate --point-source-noise-rir= --point-source-noise= --point-source-snr= --start-time= --end-time= --original-input= - - | wav-reverberate --point-source-noise-rir= --point-source-noise= --point-source-snr= --start-time= --end-time= --original-input= - - | and so on...

--original-input is there so that you can compute SNR for the multiple point sources with respect to the original unperturbed input.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-210008983

danpovey commented 8 years ago

Also - why does this program care about the difference between point source and isotropic noise? Is it because one is repeated and one is not? I see that the point source noise gets reverberated, but if you did it in a separate program then that difference would vanish. I'd prefer the command line arguments to be named in as generic as possible a way, reflecting what the program actually does with the arguments, rather than your intention behind creating the program. If this is difficult to express for some reason, OK to leave it as it is.

On Thu, Apr 14, 2016 at 9:53 AM, Daniel Povey dpovey@gmail.com wrote:

Do you really need end times as well as start times for the point source noises? Also I don't think the --rir-file option should end in -file. Might it be easier to supply multiple point source noises to the same program somehow, e.g. via a comma separated list or some other mechanism? Is there a reason why the RIRs and original wavs for all the noises need to be supplied separately, instead of having a separate program create them? It just seems complicated-- but up to you.

Dan

On Thu, Apr 14, 2016 at 8:43 AM, Vimal Manohar notifications@github.com wrote:

wav-reverberate --rir-file= --isotropic-noise= --isotropic-snr= --point-source-noise-rir= --point-source-noise= --point-source-snr= --start-time= --end-time= - | wav-reverberate --point-source-noise-rir= --point-source-noise= --point-source-snr= --start-time= --end-time= --original-input= - - | wav-reverberate --point-source-noise-rir= --point-source-noise= --point-source-snr= --start-time= --end-time= --original-input= - - | and so on...

--original-input is there so that you can compute SNR for the multiple point sources with respect to the original unperturbed input.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-210008983

vimalmanohar commented 8 years ago

Ok in that case, we could have a generic program: wav-reverberate --impulse-response=<h> --additive-signals=<a> --snrs=<snr-db> --start-times=<s> --end-times=<e> <input> <output> for doing output = convolve(h, input) + C1 * a1 + C2 * a2 + ..., where C1,C2.. are chosen such that signals are added at appropriate SNR and a1, a2... are restricted to be within corresponding start times s1, s2... and end times e1, e2... either by truncation or repitition. Start and end times will be needed to decide if the noise must be repeated or truncated.

For multiple point sources, the command would then be wav-reverberate --inpulse-response=<rir-wav> --additive-signals='<isotropic-noise-wav>:"wav-reverberate --impulse-response=<point-source1-rir-wav> <point-source1-noise-wav> - |":"wav-reverberate --impulse-response=<point-source2-rir-wav> <point-source2-noise-wav> - |"' --snrs='<isotropic-snr-db>:<point-source1-snr-db>:<point-source2-snr-db>' --start-times='<s0:s1:s2>' --end-times='<e0:e1:e2>' <input> <output> and so on..

danpovey commented 8 years ago

Sounds reasonable. BTW, I'm trying to, for new programs, use comma-separated lists instead of colon-separated ones, as it's a more standard method of separating list elements. Dan

On Thu, Apr 14, 2016 at 10:45 AM, Vimal Manohar notifications@github.com wrote:

Ok in that case, we could have a generic program: wav-reverberate --impulse-response= --additive-signals= --snrs= --start-times= --end-times= for doing output = convolve(h, input) + C1 * a1 + C2 * a2 + ..., where C1,C2.. are chosen such that signals are added at appropriate SNR and a1, a2... are restricted to be within corresponding start times s1, s2... and end times e1, e2... either by truncation or repitition. Start and end times will be needed to decide if the noise must be repeated or truncated.

For multiple point sources, the command would then be wav-reverberate --inpulse-response= --additive-signals=':"wav-reverberate --impulse-response= - |":"wav-reverberate --impulse-response=
- |"' --snrs='::' --start-times='s0:s1:s2' --end-times='e0:e1:e2' and so on.. — You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-210072422

vijayaditya commented 8 years ago

If we are going to separate the reverberation of noise sources into a different command I would recommend adding a --duration argument in place of the --end-time. The default start-time will be 0 and default duration will be length of the signal being added. This is recommended as it might be beneficial to reverberate the noise after extending it to the required duration. So the modified command would look like this

wav-reverberate --inpulse-response=<rir-wav> \ --additive-signals='<isotropic-noise-wav>, "wav-reverberate --duration <t> --impulse-response=<point-source1-rir-wav> <point-source1-noise-wav> - |", "wav-reverberate --duration <t2> --impulse-response=<point-source2-rir-wav> <point-source2-noise-wav> - |"' \ --snrs=' <isotropic-snr-db> ,<point-source1-snr-db>,<point-source2-snr-db>' \ --start-times='< s0,s1,s2 >' ' <input> <output>

tomkocse commented 8 years ago

for the generic command: wav-reverberate --duration --impulse-response= - let the length of noise-wav be l if t < l , it will trim the first t secs of the signal after reverberated. if t > l , it will repeat the noise to make sure the output wav's duration is t secs. Am i right in the mechanism ?

vijayaditya commented 8 years ago

Yes On Apr 14, 2016 10:28 PM, "tomkocse" notifications@github.com wrote:

for the generic command: wav-reverberate --duration --impulse-response= - let the length of noise-wav be l if t < l , it will trim the first t secs of the signal after reverberated. if t > l , it will repeat the noise to make sure the output wav's duration is t secs. Am i right in the mechanism ?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-210257028

tomkocse commented 8 years ago

As now i need to know the length of the recording (reco2dur) instead of utterance(utt2dur), should i add an option to utils/data/get_utt2dur.sh so that i can force it to compute the duration from wav.scp even if 'segments' exists ?

vijayaditya commented 8 years ago

Not necessary. You could directly use the output of wav-to-duration. You can add a python wrapper for this in reverberate.py.

On Fri, Apr 15, 2016 at 3:19 AM, tomkocse notifications@github.com wrote:

As now i need to know the length of the recording (reco2dur) instead of utterance(utt2dur), should i add an option to utils/data/get_utt2dur.sh so that i can force it to compute the duration from wav.scp even if 'segments' exists ?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/kaldi-asr/kaldi/issues/552#issuecomment-210325862

vijayaditya commented 8 years ago

Being addressed in #706

© Githubissues.

Githubissues is a development platform for aggregating issues.