introlab / odas

ODAS: Open embeddeD Audition System
MIT License
780 stars 248 forks source link

some question about related article.<Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter>. #59

Closed huotuichang1 closed 6 years ago

huotuichang1 commented 6 years ago

thinks for you all.
In the article,I have some question about the Suppression rule in the presence of speech.why the loudness-domain amplitude estimator can be presented by the formula? And how the formula can be translated into a product of a gamma function and a kummer #function's first solution?

FrancoisGrondin commented 6 years ago

Hi Huo,

Can you please let me know what article you refer to (title + year), and point the equations with their corresponding number in the paper?

huotuichang1 commented 6 years ago

https://github.com/huotuichang1/pocket/blob/master/Enhanced%20Robot%20Audition%20Based%20on%20Microphone%20Array%20Source%20Separation%20with%20Post-Filter.pdf that's the article. image image that's what i don't understand.

huotuichang1 commented 6 years ago

(15) and (16)

huotuichang1 commented 6 years ago

oh,In the odas,I just find array source separation process in the "steer2demixing.c",As for The Suppression rule in the presence of speech,I don't find it in the odas. For separation,I understand the two constraints

  1. W(k)A(k)-I ->0
  2. Ryy(k)-diag(Ryy(k))->0 in The Suppression rule in the presence of speech,why the loudness domain amplitude estimator can be defined by (15)? And how it can be translated into (16)... I know the confluent hypergeometric function and gamma function,but when I try to translate it through math,I have some difficulties.
FrancoisGrondin commented 6 years ago

For equations 15 and 16, detailed explanations and derivations are given in this paper from Ephraim & Malah: https://ieeexplore.ieee.org/document/1164550/

As for postfiltering, the actual equations you refer to are implemented by env2env_gainspeech_process called in mod_sss_process_mspf. The gain itself is estimated by a function implemented in here: https://github.com/introlab/odas/blob/master/src/utils/transcendental.c

huotuichang1 commented 6 years ago

thanks. I understand the rule in the presence of speech in separation process now. And in prf,I find that a lot of time costs in SSL, what should I do to reduce the time cost in that? as for Initializing objects.... process, if I make the scans level to 4 or more,the delay will be obvious. I read the paper ,finding that For an array of M microphones and an N -element grid, the algorithm requires M (M − 1)N table memory accesses and M (M − 1)N/2 additions. For the odas' code,in the space.c the Initializing costs lots of time in calculate distance. how can i reduce the delay? Are we initialze all the vector and Build the grid in function mod_ssl_construct,and do the process in function mod_ssl_process?what should we do to make the process faster?

FrancoisGrondin commented 6 years ago

1) For prf, I don't see any "trivial" way to reduce the processing time for now. 2) If you don't know a delay when restarting the system when the grids is generated, you could generate it and save it to disk and reload it. You'll need to code this by hand though, as this is not implemented yet. 3) For calculate distance: again, for now I think the only solution is to preinitialized and store for the next time. 4) Yes. You could reduce the grid resolution if you want to make this faster. Although initialization should not such a big concern since once the system is initialized and can run in real-time.

huotuichang1 commented 6 years ago

Maybe I understand it... In my opinion. First, we do the scanning_init_scans to build the grid and calculate all the delay and tdoas.In the scanning_init_scans,we also use space_points_fine to check the points and use hit_train to choose the areas that we will track,then do linking_maps . Second,we initialize lots of parameters like phasors,products,xcorrs,aimgs and so on.both first and second belong to initialization:mod_ssl_construct. Third,we calculate the phasor...from the voice in the mod_ssl_process and choose the aimpoint for sst.(Potential source from that?) That's what we do in ssl. Is it right? Than in the sst,we choose the track from probability. For that, I have a new question:the noise from environment will interfere the output.Sometimes they will be tracked ,too. what should we do to prevent it?

huotuichang1 commented 6 years ago

And another question~ where is the whitened process?

FrancoisGrondin commented 6 years ago

Yes I think you got the localization correct.

You are right: the noise from the environment may be detected as well. From now your best guess is to look at the separated file and try to determine if it is a noise source or speech source. This issue is not entirely solved yet, though I'm working on a solution...

The whitened process is performed when you compute the phasor, which is a complex value with a norm of 1.

huotuichang1 commented 6 years ago

thanks.

Now,I'm reading the mod_sst. I found that we compute likelihood through many ways.If I can change the weight of these ways to make the process better? And in the paper , Localization of Simultaneous Moving Sound Sources for Mobile Robot Using a Frequency-Domain Steered Beamformer Approach.pdfwhere is the P ROBABILISTIC POST - PROCESSING in the code? Is that in the mixture2mixture.c?

huotuichang1 commented 6 years ago

It seems like that is what odas different from manyears... I can't find it in the code. And maybe I know why we just can locate four sources, A important reason is the nCombinations will be too big when the nPots increases.it likes (n+2)^n. And the location will be not accurate.

zuowanbushiwo commented 6 years ago

@huotuichang1 Are you Chinese? Can you tell me your contact information? Both email and QQ are available. This project is very useful to me, but it is difficult for me. I don't understand many places and I want to ask you. Thank you!