microsoft / Azure_Kinect_ROS_Driver

A ROS sensor driver for the Azure Kinect Developer Kit.
MIT License
296 stars 222 forks source link

[Feature] Support the Microphone Array #120

Open bryantaoli opened 4 years ago

bryantaoli commented 4 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

ooeygui commented 4 years ago

The focus of the ROS node was navigation and manipulation.

If you are interested in Microphone projections, let's have a conversation.

bryantaoli commented 4 years ago

Yes, I am very interested in Microphone projections. This is because the microphone array can detect the object making the sound while navigating. If we calibrate the position of the sound object, we can use the microphone array signal to locate and then navigate. This is the reason why I bought k4a, but after installing the ros driver, I found that the ros driver does not support microphone array, but I found the SDK support of k4a, so this is just the work that needs to be done on the ros driver software, and I hope Microsoft can update it. Thanks for replying.

ooeygui commented 4 years ago

Can you share if this is this for a personal project, research project or commercial deployment?

bryantaoli commented 4 years ago

Yes I can. It is a research project. We want to use the k4a to locate and navigate.

ooeygui commented 4 years ago

Windows or Linux?

bryantaoli commented 4 years ago

Linux, more precisely ROS Melodic

roelofvandijk commented 4 years ago

@ooeygui, Thanks for taking up this conversation. I would like to strongly second this feature request!

If the Azure Kinect is to be used as a fairly complete 'robot head', access to the microphone array would be valuable (Ubuntu 18.04, ROS Melodic, telerobotics/telepresence research project).

@bryantaoli: I would be very interested if you want to continue the conversation on how to best capture audio.

I have confirmed these so far:

If not officially supported, an audio capture driver (or second ROS node) via pulseaudio might be feasible, and it can probably run next to the Azure Kinect ROS node.

bryantaoli commented 4 years ago

@ooeygui, Thanks for taking up this conversation. I would like to strongly second this feature request!

If the Azure Kinect is to be used as a fairly complete 'robot head', access to the microphone array would be valuable (Ubuntu 18.04, ROS Melodic, telerobotics/telepresence research project).

@bryantaoli: I would be very interested if you want to continue the conversation on how to best capture audio.

I have confirmed these so far:

  • The SDK viewer can read out video and audio at the same time
  • Audacity can record all 7 channels using pulseaudio, the microphone array is registered on a system level
  • Audacity can record all 7 channels while the Azure Kinect is connected via ROS
  • Correct (human) spatialization using the recorded 7 channels is possible

If not officially supported, an audio capture driver (or second ROS node) via pulseaudio might be feasible, and it can probably run next to the Azure Kinect ROS node.

Yes, I also think it is not difficult to get the audio in Azure Kinect ROS node, but I would like to get the officially support.

ooeygui commented 4 years ago

Just to confirm the ask: You'd like the Azure Kinect to output Audio samples like the audio_common pacage? (https://github.com/ros-drivers/audio_common/blob/master/audio_capture/src/audio_capture.cpp)

bryantaoli commented 4 years ago

Just to confirm the ask: You'd like the Azure Kinect to output Audio samples like the audio_common pacage? (https://github.com/ros-drivers/audio_common/blob/master/audio_capture/src/audio_capture.cpp)

Yes, I'd like the Azure Kinect to output Audio samples like the audio_common pacage which is a third-party audio development kit that implements audio drivers and related ROS message mechanisms.

star0w commented 4 years ago

Just to confirm the ask: You'd like the Azure Kinect to output Audio samples like the audio_common pacage? (https://github.com/ros-drivers/audio_common/blob/master/audio_capture/src/audio_capture.cpp)

I occurred the same problem since the kinect ros driver cannot provide access to Microphone array,which is necessary for my research in linux

star0w commented 4 years ago

@ooeygui, Thanks for taking up this conversation. I would like to strongly second this feature request! If the Azure Kinect is to be used as a fairly complete 'robot head', access to the microphone array would be valuable (Ubuntu 18.04, ROS Melodic, telerobotics/telepresence research project). @bryantaoli: I would be very interested if you want to continue the conversation on how to best capture audio. I have confirmed these so far:

The SDK viewer can read out video and audio at the same time Audacity can record all 7 channels using pulseaudio, the microphone array is registered on a system level Audacity can record all 7 channels while the Azure Kinect is connected via ROS Correct (human) spatialization using the recorded 7 channels is possible

If not officially supported, an audio capture driver (or second ROS node) via pulseaudio might be feasible, and it can probably run next to the Azure Kinect ROS node.

hi,I‘m interested in how to use the audio capture driver (or second ROS node) via pulseaudio since the kinect don't have ros node for mic. can u share me with some kind of demo if possible?

roelofvandijk commented 4 years ago

Hello @star0w, here is a brief recording example: try the python library sounddevice, use sounddevice.query_devices() to get the kinect index, sounddevice.query_devices(kinect_index) for information about the kinect (e.g. sample rate). and then use sounddevice.rec(seconds=5, channels=7, device=kinect_index, blocking=True, sample_rate=48000) to get a recording as numpy array.

If you want to stream the audio, have a look at the audio_common package.

ooeygui commented 4 years ago

Thank you all for the input on this. As our team owns ROS on Windows and many other ROS solutions, we will take this feedback and fold it into our workstream and prioritize it appropriately. Based on that backlog, the earliest we will be able to start working on it it is May 2020.

linhan94 commented 3 years ago

Thank you all for the input on this. As our team owns ROS on Windows and many other ROS solutions, we will take this feedback and fold it into our workstream and prioritize it appropriately. Based on that backlog, the earliest we will be able to start working on it it is May 2020.

Hello there, any progress? :)

ooeygui commented 3 years ago

Hi @linhan94, No progress has been made on exposing the microphone directly from this ROS node. I have no ETA to share, but would happily accept a PR.

I'm told by the Azure Kinect audio team that the microphone array is accessible as an multichannel audio device using ALSA directly. In this case, it should be accessible using the audio_common package as @roelofvandijk mentioned, but have not had an opportunity to verify or to document.

ymohamed08 commented 3 years ago

Hello @star0w, here is a brief recording example: try the python library sounddevice, use sounddevice.query_devices() to get the kinect index, sounddevice.query_devices(kinect_index) for information about the kinect (e.g. sample rate). and then use sounddevice.rec(seconds=5, channels=7, device=kinect_index, blocking=True, sample_rate=48000) to get a recording as numpy array.

If you want to stream the audio, have a look at the audio_common package.

Is it possible to stream each channel separately for spatialization using the recorded 7 channels. Is there another way for spatialization?

thank you.

roelofvandijk commented 3 years ago

Hello @youssef266, as far as I remember, sounddevice yields a numpy array containing the 7 separate channels, which you could feed into a spatialization algorithm. For live spatialization, you would have to access the audio using a system-level audio API. You could try using ODAS with the Azure Kinect as documented here or try NAudio.

Also see this issue: https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/536

busybeaver42 commented 6 months ago

Have a look to: https://github.com/busybeaver42/kv3 It comes with an example for ODAS and kinect azure. It contains the right odas cfg file for the kinect azure mirophone array and the example show how you can use it in parallel together with kinect azure frame stream and opencv rendering.