sharathadavanne / seld-dcase2019

Benchmark for sound event localization task of DCASE 2019 challenge
Other
71 stars 29 forks source link

IR for noise generation #12

Closed eshalev4 closed 5 years ago

eshalev4 commented 5 years ago

Hi,

First of all, thanks for the grate code. I am working on my own model and I would like to test it against some scenarios for which I need to add directional noise. Is it possible to get you IR so I can convolve and add a directional noise to the audio files?

If so, are there any spatial contingencies on the placing of noise in the environment of each specific IR?

sharathadavanne commented 5 years ago

Hi @eshalev4, you can find the room impulse responses for one of the five locations used in the dataset here. Check out the RIR description and let me know if you have any questions regarding it.

We are currently organizing the remaining location impulse responses and the corresponding codes to synthesize datasets using them and should release it soon.

eshalev4 commented 5 years ago

Ok, great. Thanks a lot for your fast response.

eshalev4 commented 5 years ago

Just one last question. Which IR (1,2,3,4,5) is it, location wise?

sharathadavanne commented 5 years ago

Number 4. The location description for which can also be seen here - http://dcase.community/challenge2019/task-sound-event-localization-and-detection#audio-dataset

eshalev4 commented 5 years ago

Thanks a lot.

eshalev4 commented 5 years ago

What do the blocks represent? how do I choose the correct block?

sharathadavanne commented 5 years ago

Hi @eshalev4 this code block on convolving a sample audio recording se_audio with the RIR should help you

% Load RIR
rir_base_path = '/path/to/rir/database';
load(rir_base_path, 'rir_DB') 
%  rir_DB is of dimension (2, 9, 1025, 36, 4, 32) =
% (distance_wrt_mic, elevation_wrt_mic, FFT,  azimuth_wrt_mic, blocks, channels). 
% See individual variable description on zenodo.

% Load audio of interest
se_audio = [] ; waveform at 48kHz sampling rate
orglen = length(se_audio);

% Position the audio of interest at a fixed distance, elevation and azimuth from the microphone
event_dist = 1; % two distances: 1 and 2 meters
event_elev = 10; % -40:10:40 at distance 1m, and -20:10:20 at distance 2m
event_azi = 80; % -180:10:180 range

% Collect the corresponding impulse response for the spatial position given by (distance, elevation, azimuth)
H = squeeze(rir_DB(event_dist, 1+4+event_elev/10, :, 1+18 +event_azi/10, :, :));

% Variables used to extract impulse response 
fs = 48000;
winlen = 2048;
hoplen = winlen/2;

I = size(H, 1); % 1025
L = size(H, 2); % 4
C = size(H, 3); % 32

% Convolve the spectrogram with the impulse response

X_conv = spectrogram(se_audio, hanning(winlen), hoplen, winlen);

frame_len = size(X_conv,1);
XestLSsyn = zeros(I, frame_len, C);

nframe = 1;
sta = 1;
sto = sta + winlen - 1;
while sto <= orglen
    Lind = nframe:min(nframe+L-1,frame_len);

    % Apply estimated RIR to close-field signal
    spatial_audio = squeeze(bsxfun(@times,H,X_conv(:,nframe)));
    XestLSsyn(:,Lind,:) = XestLSsyn(:,Lind,:) + spatial_audio(:,1:length(Lind),:);

    sta = sta + hoplen;
    sto = sto + hoplen;
    nframe = nframe + 1;
end

% Convert XestLSsyn to timedomain to obtain waveform
eshalev4 commented 5 years ago

Thanks a lot. That was very helpful. Looking forward to an update on the other RIRs.