Closed huotuichang1 closed 6 years ago
Hi Huo,
Can you please let me know what article you refer to (title + year), and point the equations with their corresponding number in the paper?
https://github.com/huotuichang1/pocket/blob/master/Enhanced%20Robot%20Audition%20Based%20on%20Microphone%20Array%20Source%20Separation%20with%20Post-Filter.pdf that's the article. that's what i don't understand.
(15) and (16)
oh,In the odas,I just find array source separation process in the "steer2demixing.c",As for The Suppression rule in the presence of speech,I don't find it in the odas. For separation,I understand the two constraints
For equations 15 and 16, detailed explanations and derivations are given in this paper from Ephraim & Malah: https://ieeexplore.ieee.org/document/1164550/
As for postfiltering, the actual equations you refer to are implemented by env2env_gainspeech_process
called in mod_sss_process_mspf
. The gain itself is estimated by a function implemented in here: https://github.com/introlab/odas/blob/master/src/utils/transcendental.c
thanks. I understand the rule in the presence of speech in separation process now. And in prf,I find that a lot of time costs in SSL, what should I do to reduce the time cost in that? as for Initializing objects.... process, if I make the scans level to 4 or more,the delay will be obvious. I read the paper ,finding that For an array of M microphones and an N -element grid, the algorithm requires M (M − 1)N table memory accesses and M (M − 1)N/2 additions. For the odas' code,in the space.c the Initializing costs lots of time in calculate distance. how can i reduce the delay? Are we initialze all the vector and Build the grid in function mod_ssl_construct,and do the process in function mod_ssl_process?what should we do to make the process faster?
1) For prf, I don't see any "trivial" way to reduce the processing time for now. 2) If you don't know a delay when restarting the system when the grids is generated, you could generate it and save it to disk and reload it. You'll need to code this by hand though, as this is not implemented yet. 3) For calculate distance: again, for now I think the only solution is to preinitialized and store for the next time. 4) Yes. You could reduce the grid resolution if you want to make this faster. Although initialization should not such a big concern since once the system is initialized and can run in real-time.
Maybe I understand it... In my opinion. First, we do the scanning_init_scans to build the grid and calculate all the delay and tdoas.In the scanning_init_scans,we also use space_points_fine to check the points and use hit_train to choose the areas that we will track,then do linking_maps . Second,we initialize lots of parameters like phasors,products,xcorrs,aimgs and so on.both first and second belong to initialization:mod_ssl_construct. Third,we calculate the phasor...from the voice in the mod_ssl_process and choose the aimpoint for sst.(Potential source from that?) That's what we do in ssl. Is it right? Than in the sst,we choose the track from probability. For that, I have a new question:the noise from environment will interfere the output.Sometimes they will be tracked ,too. what should we do to prevent it?
And another question~ where is the whitened process?
Yes I think you got the localization correct.
You are right: the noise from the environment may be detected as well. From now your best guess is to look at the separated file and try to determine if it is a noise source or speech source. This issue is not entirely solved yet, though I'm working on a solution...
The whitened process is performed when you compute the phasor, which is a complex value with a norm of 1.
thanks.
Now,I'm reading the mod_sst. I found that we compute likelihood through many ways.If I can change the weight of these ways to make the process better?
And in the paper
It seems like that is what odas different from manyears... I can't find it in the code. And maybe I know why we just can locate four sources, A important reason is the nCombinations will be too big when the nPots increases.it likes (n+2)^n. And the location will be not accurate.
@huotuichang1 Are you Chinese? Can you tell me your contact information? Both email and QQ are available. This project is very useful to me, but it is difficult for me. I don't understand many places and I want to ask you. Thank you!
thinks for you all.
In the article,I have some question about the Suppression rule in the presence of speech.why the loudness-domain amplitude estimator can be presented by the formula? And how the formula can be translated into a product of a gamma function and a kummer #function's first solution?