Sound source localization along with speech recognition

srinivasanviki commented 6 years ago

As part of the project we are trying to implement Sound Localization along with Speech Recognition in ROS using Xbox Kinect. We have run into problem where we need to find the position of a person who spoke something ( which is handled by Sound Localization module ) along with what the person said ( Speech Recognition ).

Can you please advise us on the implementation of the same as we are not able to figure out if HARK can get the data of localization along with speech recognition ( who said what from which direction ) in single go.

awesomebytes commented 6 years ago

I haven't tried the speech recognition code of HARK, so I can't really tell you.

If you use some other speech recognition engine you should be able to guess who spoke with a bit of code to keep track of the last sound localization results.

Good luck!

On Sep 19, 2017 22:50, "srinivasanviki" notifications@github.com wrote:

As part of the project we are trying to implement Sound Localization along with Speech Recognition in ROS using Xbox Kinect. We have run into problem where we need to find the position of a person who spoke something ( which is handled by Sound Localization module ) along with what the person said ( Speech Recognition ).

Can you please advise us on the implementation of the same as we are not able to figure out if HARK can get the data of localization along with speech recognition ( who said what from which direction ) in single go.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uts-magic-lab/hark_sound_localization/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpFdGOya-WDNugeFky8uDmH76L5v7bWks5sj7h3gaJpZM4PcV0W .

srinivasanviki commented 6 years ago

But your localization result is constant stream how do we keep track of last localization results in ros topic please suggest

awesomebytes commented 6 years ago

Keep a buffer of the last... Few seconds of localization results, when you get a speech recognition result, estimate the duration of the speech given the text recognition + the delay of getting the result and make an average of the loudest localized sounds in that timeframe.

For example, keep all the messages of the last 10s from the localization.

Estimate how long from when you speak to when the recognition engine gives a result it takes. For example 100ms.

When you get a callback from the recognition engine, for example, it recognized "hello world", estimate that those 3 syllables (this paper https://www.google.com.au/url?sa=t&source=web&rct=j&url=http://www.asel.udel.edu/icslp/cdrom/vol4/301/a301.pdf&ved=0ahUKEwiyiezqv7HWAhWLQpQKHVHNBZkQFggdMAA&usg=AFQjCNHjHYikZy-oDmBYdZ04jNqRwjAYVg says the average duration of a syllabe is 150ms~). Then you got 3 * 150 = 450ms.

Now go to your buffer and from the end go back -100ms and from there get all the messages to -550ms. Average the localization, probably by taking also only the messages with a louder volume.

That's how I would try. From a kinda hacky perspective.

Other than that, learn how to use HARK speech recognizer too.

On Sep 20, 2017 00:46, "srinivasanviki" notifications@github.com wrote:

But your localization result is constant stream how do we keep track of last localization results in ros topic please suggest

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/uts-magic-lab/hark_sound_localization/issues/1#issuecomment-330563518, or mute the thread https://github.com/notifications/unsubscribe-auth/ABpFdGYcIcXWk11zR7_pbqXoju4fq6Xnks5sj9PQgaJpZM4PcV0W .

srinivasanviki commented 6 years ago

Thanks for the suggestion I have a problem with localization too , When i do ROSLANCH pr2_kinect iam getting a continuous stream of localization results on topic HarkSource even when iam not speaking.
Iam also getting an incorrect results as -4,-9 azimuth degrees but never gets to positive angle even when iam on positive x axis.

uts-magic-lab / hark_sound_localization

Sound source localization along with speech recognition #1