google / oboe

Oboe is a C++ library that makes it easy to build high-performance audio apps on Android.
Apache License 2.0
3.73k stars 573 forks source link

Get native device sample rate / buffer-size / format #95

Closed p-i- closed 6 years ago

p-i- commented 6 years ago

I'm attempting to get the raw-est possible data from the microphone.

Looking through the documentation I can't see any technique to query the hardware.

For reference, there is a table of native sample rate & native buffer-size for various android devices here: https://source.android.com/devices/audio/latency_measurements

Would it be possible for Oboe to expose an API for retrieving these values?

Getting the native sample rate is particularly important; I am embedding data in audio, and I need to avoid any resampling.

Also https://github.com/google/oboe/blob/master/include/oboe/AudioStream.h#L273 setNativeFormat is a very confusing (and undocumented) name. Surely the native format is something you can only get not set.

PS I found https://github.com/google/oboe/blob/master/include/oboe/AudioStreamBuilder.h#L217 which half-answers the question about native buffer-length (it only specifies OUTPUT).

philburk commented 6 years ago

For AAudio, you can just not specify a sample rate and you will get the optimal rate. But for OpenSL ES you need to get the optimal rate from the AudioManager.

AudioManager audioManager = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE);
String rate = audioManager.getProperty(AudioManager.PROPERTY_OUTPUT_SAMPLE_RATE);
String size = audioManager.getProperty(AudioManager.PROPERTY_OUTPUT_FRAMES_PER_BUFFER);

AudioStream.setNativeFormat() is protected and undocumented because it is only used internally.

I found https://github.com/google/oboe/blob/master/include/oboe/AudioStreamBuilder.h#L217 which half-answers the question about native buffer-length (it only specifies OUTPUT).

Generally the INPUT buffer size will be the same.

Getting the native sample rate is particularly important; I am embedding data in audio, and I need to avoid any resampling.

Are you sending data as raw binary? That can only work through USB. This sort of direct pass-through is not really supported.

We cannot change the API for OpenSL ES. But we can consider new APIs for AAudio for Q or later releases. Please let us know specifically what would be helpful to you and we will consider it. Do you need a bit perfect mode for USB?

p-i- commented 6 years ago

My project involves sending/receiving information > 18kHz.

Any preprocessing can only reduce information content in a signal, and real-time sample-rate conversion is especially well known for mangling high frequency signals.

Hence I'm trying to get access to the raw audio stream.

But I'm facing a real struggle getting the native input sample-rate (and buffer-size & format).

If I do:

        int bufferSize = AudioRecord.getMinBufferSize(
                196000,
                AudioFormat.CHANNEL_IN_MONO,
                AudioFormat.ENCODING_PCM_16BIT
                );

        AudioRecord audioRecord = new AudioRecord(
                MediaRecorder.AudioSource.DEFAULT,
                196000, // SAMPLE_RATE_UNSPECIFIED ??
                AudioFormat.CHANNEL_IN_MONO,
                AudioFormat.ENCODING_PCM_16BIT,
                bufferSize
                );

... and it appears as if I have 196kHz mic-input! Pretty sure this is totally wrong.

The most plausible solution I can find is:

        String pRate        = m.getProperty(AudioManager.PROPERTY_OUTPUT_SAMPLE_RATE);                  // S8: 48000
        String pFrames      = m.getProperty(AudioManager.PROPERTY_OUTPUT_FRAMES_PER_BUFFER);            // S8: 192

It's horrible because these constants contain the word OUTPUT but I want MIC i.e. INPUT. Is it guaranteed to be the same? I wish I could find it officially written somewhere... otherwise I don't know if my code will break on some random device. And also it seems weird to have to go to the SDK for this. But it seems to return reasonable values (Tested on my Samsung S8).

Is this actually even guaranteed to return the native sample-rate of the microphone? Is there even such a thing? Is it set at some hardware level?

And as for removing any other kind of preprocessing, I can examine constructing AudioRecord with MediaRecorder.AudioSource.DEFAULT vs .UNPROCESSED vs .VOICE_RECOGNITION.

^ ... So I'm going with .VOICE_RECOGNITION. Word on the Internet seems to be that this gives raw unprocessed data.

There's another interesting AudioRecord constructor constant SAMPLE_RATE_UNSPECIFIED. And I believe this should fish out the native sample rate. And but I'm not sure. But we have chicken and egg problem; I need to pass the sample rate to get the buffer size, and I need the buffer size together with this constant to create the AudioRecord object.

I've noticed that I can test the values it actually gives me:

        // https://developer.android.com/reference/android/media/MediaRecorder.AudioSource
        String sourceNames [] = {
                "DEFAULT",              // 0
                "MIC",                  // 1
                "VOICE_UPLINK",         // 2
                "VOICE_DOWNLINK",       // 3
                "VOICE_CALL",           // 4
                "CAMCORDER",            // 5
                "VOICE_RECOGNITION",    // 6
                "VOICE_COMMUNICATION",  // 7
                "REMOTE_SUBMIX",        // 8
                "UNPROCESSED"           // 9
                };

        Log.e( LOG_TAG, "(getAudioRecord) getAudioSource() = " + sourceNames[ audioRecord.getAudioSource() ]        ); // S8: VOICE_RECOGNITION (what I asked for!)
        Log.e( LOG_TAG, "(getAudioRecord) getSampleRate()  = " +              audioRecord.getSampleRate()           ); // S8: 48000
        Log.e( LOG_TAG, "(getAudioRecord) getBufferSizeInFrames()  = " +      audioRecord.getBufferSizeInFrames()   ); // S8: 2880
    // !!! toDo: do getAudioFormat() also

I note that the buffer reported (2880) differs vastly from the buffer recommended (192).

So maybe I could initialise my AudioRecord object with SAMPLE_RATE_UNSPECIFIED, ENCODING_DEFAULT and bufSize=0, and hope it gives me system defaults.

Maybe this is my solution? I hesitate to ship an undocumented solution.

So, my apologies for floundering in public. But maybe it will help to serve to illustrate the painful state-of-the-art Oboe wishes to address.

If anyone can shed some light on this jumble, I am much obliged.

Complete code:

    @Nullable
    private AudioRecord getAudioRecord( Activity activity )
    {
        AudioManager m = (AudioManager) activity.getSystemService(Context.AUDIO_SERVICE);

        String pRate        = m.getProperty(AudioManager.PROPERTY_OUTPUT_SAMPLE_RATE);                  // S8: 48000
        String pFrames      = m.getProperty(AudioManager.PROPERTY_OUTPUT_FRAMES_PER_BUFFER);            // S8: 192

        Log.e(LOG_TAG, "(getAudioRecord) PROPERTY_OUTPUT_SAMPLE_RATE:"              + pRate);
        Log.e(LOG_TAG, "(getAudioRecord) PROPERTY_OUTPUT_FRAMES_PER_BUFFER:"        + pFrames);

        int nativeSampleRate = Integer.parseInt(pRate);
        int minBufLen        = Integer.parseInt(pFrames);

        // Samsung S8 lets me pass in 192000Hz, yielding bufferSize of 15360 (!)
        // For 48000Hz it gives 3840
        int bufferSize = AudioRecord.getMinBufferSize(
                nativeSampleRate,
                AudioFormat.CHANNEL_IN_MONO,
                AudioFormat.ENCODING_PCM_16BIT
                );

        if (bufferSize == AudioRecord.ERROR_BAD_VALUE) {
            Log.e(LOG_TAG, "(getAudioRecord) getMinBufferSize returned AudioRecord.ERROR_BAD_VALUE");
            return null;
        }

        Log.e(LOG_TAG, "(getAudioRecord) AudioRecord.getMinBufferSize() = " + bufferSize);

        String pRawSupport   = m.getProperty(AudioManager.PROPERTY_SUPPORT_AUDIO_SOURCE_UNPROCESSED);   // S8: false
        String pUltraMic     = m.getProperty(AudioManager.PROPERTY_SUPPORT_MIC_NEAR_ULTRASOUND);        // S8: true
        String pUltraSpkr    = m.getProperty(AudioManager.PROPERTY_SUPPORT_SPEAKER_NEAR_ULTRASOUND);    // S8: true

        Log.e(LOG_TAG, "(getAudioRecord) PROPERTY_SUPPORT_AUDIO_SOURCE_UNPROCESSED:"+ pRawSupport);
        Log.e(LOG_TAG, "(getAudioRecord) PROPERTY_SUPPORT_MIC_NEAR_ULTRASOUND:"     + pUltraMic);
        Log.e(LOG_TAG, "(getAudioRecord) PROPERTY_SUPPORT_SPEAKER_NEAR_ULTRASOUND:" + pUltraSpkr);

        // Samsung S8 reports false
        boolean rawSupport = (pRawSupport == "false") ? false : true;

        // https://stackoverflow.com/questions/14377481/how-avoid-automatic-gain-control-with-audiorecord
        // I can't find an authoritative reference, but word on the Internet seems to be that
        //   'VOICE_RECOGNITION has the least preprocessing'
        int RAW = rawSupport ? MediaRecorder.AudioSource.UNPROCESSED : MediaRecorder.AudioSource.VOICE_RECOGNITION; // S8: VOICE_RECOGNITION

        AudioRecord audioRecord = new AudioRecord(
                RAW,                                // was: MediaRecorder.AudioSource.DEFAULT,
                nativeSampleRate,                   // SAMPLE_RATE_UNSPECIFIED also works
                AudioFormat.CHANNEL_IN_MONO,        //   .. but then we don't know buffer-size
                AudioFormat.ENCODING_PCM_16BIT,
                bufferSize
                );

        if (audioRecord.getState() != AudioRecord.STATE_INITIALIZED) {
            Log.e(LOG_TAG, "(getAudioRecord) audioRecord.getState() != AudioRecord.STATE_INITIALIZED");
            return null;
        }

        // https://developer.android.com/reference/android/media/MediaRecorder.AudioSource
        String sourceNames [] = {
                "DEFAULT",              // 0
                "MIC",                  // 1
                "VOICE_UPLINK",         // 2
                "VOICE_DOWNLINK",       // 3
                "VOICE_CALL",           // 4
                "CAMCORDER",            // 5
                "VOICE_RECOGNITION",    // 6
                "VOICE_COMMUNICATION",  // 7
                "REMOTE_SUBMIX",        // 8
                "UNPROCESSED"           // 9
                };

        Log.e( LOG_TAG, "(getAudioRecord) AudioRecord() SUCCESSFUL!" );

        Log.e( LOG_TAG, "(getAudioRecord) getAudioSource() = " + sourceNames[ audioRecord.getAudioSource() ]        ); // S8: VOICE_RECOGNITION
        Log.e( LOG_TAG, "(getAudioRecord) getSampleRate()  = " +              audioRecord.getSampleRate()           ); // S8: 48000
        Log.e( LOG_TAG, "(getAudioRecord) getBufferSizeInFrames()  = " +      audioRecord.getBufferSizeInFrames()   ); // S8: 2880

        return audioRecord;
    }
dturner commented 6 years ago

Have you tried using AudioManager.getDevices(GET_DEVICES_INPUTS) to obtain the list of attached audio devices?

You can then use AudioDeviceInfo.getSampleRates() to obtain the list of supported sample rates, although this is only available from API 23.

I'm pretty sure that Android always configures the audio hardware (e.g. the CODEC IC) to use the highest sample rate so you'd just need to use the highest value from the sample rates list.

If the sample rates array is empty it (very unhelpfully) means that you can supply any sample rate and it'll be resampled to the native sample rate.

Is this any help? I admit this is a mess of APIs for a seemingly simple task and would welcome suggestions to make it better.

p-i- commented 6 years ago

Alas we have to support API 19.

Ultimately my concern is this:

Whether or not manufacturers like it (Apple don't like it), Ultrasound is becoming a player.

To transmit data over ultrasound we need access to raw unfiltered uncompressed un-resampled microphone data. resamplers tend to mangle high frequencies.

If you are able to expose such an interface, audio developers all over the world will do the happy dance.

dturner commented 6 years ago

I was hoping there would be a better answer than this (and maybe someone will come up with something) but...

I believe the only way to achieve what you're looking for is to brute force call AudioRecord.getMinBufferSize() with sample rates of 48kHz and 44.1kHz and use the sample rate which gives you the lowest buffer size.

Why 48kHz and 44.1kHz? As far as I'm aware these are the most commonly used sample rates for audio codecs in mobile devices.

You may also want to do the same calls with both:

channelConfig = CHANNEL_IN_MONO and CHANNEL_IN_STEREO
audioFormat = ENCODING_PCM_16BIT and ENCODING_PCM_FLOAT

Totalling 8 possible combinations.

p-i- commented 6 years ago

Thanks for this! This is the first time I have seen this suggestion. Do you mean lowest in terms of least samples or least bytes? What is the rationale behind this?

I've noticed also that:

public int read (short[] audioData, 
                int offsetInShorts, 
                int sizeInShorts)

(introduced in API 3) behaves in a very unexpected manner. It returns sizeInShorts each time! It just seems to block until the request has completed.

This is surely defeats the point of such an interface. This design (reminiscent of UDP/socket coding) suggests the underlying implementation should be reading the samples that are waiting to be read, not stall the thread until the buffer has been filled.

I haven't experimented with the other read methods available for API 3. Is this a behaviour you guys are aware of? It certainly makes low latency audio rather a challenge for API < 23.

dturner commented 6 years ago

Do you mean lowest in terms of least samples or least bytes?

Least samples.

I'm afraid I'm not familiar with the AudioRecord::read API. My advice would be to use Oboe (C++) for reading an input audiostream. It works down to API 16 and doesn't block when you call read with a timeout value of zero (it will just return the number frames which can be read in a single operation).

p-i- commented 6 years ago

Might it be feasible to add getNativeMicSampleRate, getNativeMicBufferFrames, getNativeEncoding getNativeNumMicChannels to Oboe?

Maybe a getNativeCharacteristics function that supplies all of this data for both input and output...

It would be nice to be able to encapsulate this mess once and for all.

As it seems to require Java SDK calls, I'm guessing it may not be pretty.

dturner commented 6 years ago

It sure might, thanks for the feedback.

Bit more info: Making calls to Java APIs requires the JNI environment to be passed to Oboe and there is currently no mechanism for doing so. This is why the default sample rate is hardcoded to 48kHz when using API < 26. It's left to the caller to supply the optimal sample rate.

We have discussed adding this though. I will file a feature request.

dturner commented 6 years ago

Added: https://github.com/google/oboe/issues/116

p-i- commented 6 years ago

Great!

I'm imagining something like the following:

enum DATA_TYPE {
    SINT16, FLOAT32, FLOAT64
};

struct HardwareStreamProps {
    int inOut; // 1 for input 0 for output
    std::string name; // e.g. "Inbuilt Mic"
    int sampleRate; // is this ever noninteger??
    DATA_TYPE dataType;
    int nChannelsPerFrame;
    int nFramesPerBuffer;
};

std::array<HardwareStreamProps> enumerateHardwareStreamProps(void* jniEnv);

That leaves one mystery for realtime audio-devs to ponder: is there synchronisation between input and output buffers? On iOS it is possible, in the render (speakers hungry) callback, to inspect the mic's input-buffer. This gives the ultimate low latency.

Personally I do not much care at the moment for the answer. Just logically it becomes the next point of enquiry from the perspective of building a down-to-the-wire real-time audio interface.

dturner commented 6 years ago

Synchronous I/O isn't currently possible using the Android APIs although this feature has been discussed internally

PatrykMis commented 1 year ago

@p-i- have you acheaved detecting devices native sample rate already? Looking for method to obtain it via Android native APIs such as AudioRecord etc.