thesofproject / sof

Sound Open Firmware
Other
531 stars 307 forks source link

[FEATURE] Use beamforming algorithm in the sof-hda-dsp topologies #7246

Open perexg opened 1 year ago

perexg commented 1 year ago

We have the TDFB algorithm already implement. Would be possible to enable this processing in the sof-hda-dsp firmware / topologies pipes ? There is already hardware on the market with 4 microphones, so we can improve the microphone input on this hardware.

The necessary parameters may be set through the standard "mixer" controls (if there's not a better solution - e.g. ACPI).

plbossart commented 1 year ago

Indeed, the quality of the DMIC capture is far from great in Linux, we have recurring reports that the quality is subpar compared to Windows. We could include a dynamic range processing to make the sound a bit louder, and certainly providing a 4-channel input is not great for Linux applications. I think PulseAudio/PipeWire mostly downmix to 2channels without taking into account mic position and sensitivity. Having a better beamforming would be useful indeed.

IIRC the position of the microphones is included in the NHLT tables, so this is something that could be reported to userspace. I don't recall if the sensitivity was reported. The main issue is that this algorithm is to the best of my knowledge not included in the signed firmware, and the tuning of those parameters is rather form-factor specific and the number of people who know how to tune those things can be counted with one hand.

Adding @kv2019i @singalsu @lgirdwood for more comments.

perexg commented 1 year ago

Thank you for this quick comment.

the quality of the DMIC capture is far from great in Linux

Only note that I would keep the "raw" streaming mode for the eventual software processing. But for the normal use, it makes sense to improve things in the hardware DSP.

IIRC the position of the microphones is included in the NHLT tables

Could you point me to the specification / documentation for this, please? I would like to consider to extend the user space API to export this information to the user space for the additional software DSP processing. My understanding is that the 3D position and microphone orientation (angles) should be provided for the (beamforming) DSP algorithms. Actually, we just provide API for the sound placement in the PCM API (like "rear left") which may not be sufficient.

I think PulseAudio/PipeWire mostly downmix to 2channels

Adding @wtay (PipeWire main maintainer) for further comments, but I think that many applications expect only stereo input, so this is probably true.

plbossart commented 1 year ago

Yes, we've started adding multiple capture paths for raw and processed captured. I don't think this was enabled in existing hardware, @lyakh did we add multi-capture in stable-2.2, or is this limited to MTL/topology2?

The NHLT public spec is here: https://01.org/sites/default/files/595976_intel_sst_nhlt.pdf

Section 2.5 report the mic coordinates, but reading that part again I am not sure how it works if there are mics on the lid and keyboard parts, and we don't have the hinge angle....

perexg commented 1 year ago

The NHLT public spec is here: https://01.org/sites/default/files/595976_intel_sst_nhlt.pdf

Section 2.5 report the mic coordinates, but reading that part again I am not sure how it works if there are mics on the lid and keyboard parts, and we don't have the hinge angle....

It seems that the structure at page 17 explains that:

typedef struct _VENDOR_MIC_CONFIG
{
    BYTE Type;
    BYTE Panel;
    WORD SpeakerPositionDistance;   // mm
    WORD HorizontalOffset;          // mm
    WORD VerticalOffset;            // mm
    BYTE FrequencyLowBand;          // 5*Hz
    BYTE FrequencyHighBand;         // 500*Hz
    SHORT DirectionAngle;           // -180 - + 180
    SHORT ElevationAngle;           // -180 - + 180
    SHORT WorkVerticalAngleBegin;   // -180 - + 180 with 2 deg step
    SHORT WorkVerticalAngleEnd;     // -180 - + 180 with 2 deg step
    SHORT WorkHorizontalAngleBegin; // -180 - + 180 with 2 deg step
    SHORT WorkHorizontalAngleEnd;   // -180 - + 180 with 2 deg step
} VENDOR_MIC_CONFIG;

The origin 3D point (0, 0, 0) is the display centre (page 20) - for the front panel.

plbossart commented 1 year ago

Right, but if you see Figure 20, not all microphones are located on the display, so the microphone at the bottom left is not at a constant position wrt the 3 others. Anyways, probably a 2rd order detail, there are lots of items to check before worrying about this....

perexg commented 1 year ago

True, but I doubt that laptops have an open lid angle sensor. But this angle is usually around ~110 degrees, so it may be a constant for the initial implementation.

plbossart commented 1 year ago

Newer laptops do have an angle reported, and there's even the term 'Posture' used to describe how the device is used (normal, tent, tablet, etc). but as far as I know this angle is not exposed by ISH so far in Linux.

lyakh commented 1 year ago

@lyakh did we add multi-capture in stable-2.2, or is this limited to MTL/topology2?

@plbossart don't see topology1 multi-capture having being added to any of the stable branches. We added mux/demux fixes in #6762 but I don't see any such topologies there now

singalsu commented 1 year ago

The fixed beamformer need very accurate mic locations. I doubt an angle sensor in the lid is accurate enough to get e.g. 1 mm level precision for array (x,y,z) positions for all microphones. Such would work better with adaptive beamformer. But it also needs to know the (az, el) direction(s) for distortion-less capture.

Also complex mechanics with lid & body adds up. But beamforming with the fixed array part channels with line array should work. I think the fixed microphones can be determined from the NHLT info. Some user space service could upload suitable settings to FW. The settings can be pre-computed to preset blobs like today or make the user space tools calculate them on the fly (ref matlab tool for tdfb).

singalsu commented 1 year ago

Also one thing that could be done would be to add DRC to microphone capture pipeline, then making it louder would be safe. The current +20 dB boost with IIR high-pass is at about limit what can be done. Both DRC and TDFB need to be converted to IPC4 so they are not available in current git main.