millicast / millicast-native-sdk

SDK to build native clients using the Millicast platform.
Other
16 stars 4 forks source link

There is no sound with MCCustomAudioSource publishing in iOS. #22

Open akryzhanovskiy opened 6 months ago

akryzhanovskiy commented 6 months ago

Hi team, we faced with a problem during publishing with MCCustomAudioSource. There is no sound. If we use MCAudioSource with MCMedia.getAudioSources(), it works. But we have callback with CMSampleBuffer that we transform to MCAudioFrame and we should update audio source, so that's why we need to use MCCustomAudioSource that has method - (void) onAudioFrame: (MCAudioFrame*) frame.

There is a part of code:

`final class MillicastPublisherListener: NSObject, ObservableObject {

private let publishingQueue: DispatchQueue = .init(label: "MillicastPublishingQueue", qos: .userInitiated)

var publisher: MCPublisher?

let videoSource: MCCoreVideoSource = MCCoreVideoSourceBuilder().build()
let audioSource: MCCustomAudioSource = MCCustomAudioSourceBuilder().build()

let settings: Settings

private var videoTrack: MCVideoTrack?
var audioTrack: MCAudioTrack?

func setupPublisher(credentials: MCPublisherCredentials) {
    self.publisher = MCPublisher.init(delegate: self)
    self.publisher?.setCredentials(credentials, completionHandler: {_ in })
}

func onExternalVideoFrame(sampleBuffer: CMSampleBuffer) {
    publishingQueue.async { [weak self] in
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            return 
        }
        self?.videoSource.onPixelBuffer(pixelBuffer)
    }
}

func onExternalAudioFrame(sampleBuffer: CMSampleBuffer) {
    publishingQueue.async { [weak self] in
        let frame = MCAudioFrame(audioSampleBuffer: sampleBuffer)
        self?.audioSource.onAudioFrame(frame)
    }
}

func publish() {
    publishingQueue.async { [weak self] in
        guard let self else  {
            return
        }

        let connectionOptions: MCConnectionOptions = .init()
        connectionOptions.autoReconnect = false

        self.publisher?.connect(with: connectionOptions) { error in
            if let error = error {
                print("Unable to connect: \(error.localizedDescription)")
            }
        }
    }
}

func unpublish() {
    publishingQueue.async { [weak self] in
        guard let self else {
            return
        }
        self.publisher?.unpublish { error in
            if let error = error {
                print("Unpublish error: \(error.localizedDescription)")
            }
        }
        self.videoSource.stopCapture()
        self.audioSource.stopCapture()
        self.publisher?.disconnect { error in
            if let error = error {
                print("Publisher disconnect error: \(error.localizedDescription)")
            }
        }
        self.publisher?.clearTracks {
        }
        self.audioTrack = nil
        self.videoTrack = nil
    }
}

}`

`extension MillicastPublisherListener: MCDelegate { func onConnected() {

    if let audioTrack = audioSource.startCapture() as? MCAudioTrack {
        self.audioTrack = audioTrack
        publisher?.addTrack(with: audioTrack) {
            print("Audio track added to Publisher")
        }
    }

    if let videoTrack = videoSource.startCapture() as? MCVideoTrack {
        self.videoTrack = videoTrack
        publisher?.addTrack(with: videoTrack) {
            print("Video track added to Publisher")
        }
    }

    let audioCodecs = MCMedia.getSupportedAudioCodecs() ?? []
    let videoCodecs = MCMedia.getSupportedVideoCodecs() ?? []

    let publisherOptions: MCClientOptions = .init()
    publisherOptions.audioCodec = settings.millicastAudioCodec ?? audioCodecs[0]
    publisherOptions.videoCodec = settings.millicastVideoCodec ?? videoCodecs[2]
    publisherOptions.degradationPreferences = settings.millicastDegradationPreferences
    publisherOptions.priority = NSNumber(value: settings.millicastPublishingPriority) 
    publisherOptions.svcMode = settings.millicastScalabilityMode
    publisherOptions.bitrateSettings = settings.millicastBitrateSettings
    publisherOptions.sourceId = nil

    publisher?.publish(with: publisherOptions) { error in
        if let error = error {
            print("Failed publish with error: \(error.localizedDescription)")
        }
    }
}

}`

When I add breakpoint after creating audioTrack by audioSource.startCapture() I see in console that: audioTrack._source = id(0x0). At the same time videoTrack._source = (MCCoreVideoSource?).

Screenshot 2024-03-22 at 12 37 12

Can you please take a look and advise what to do?

djova-dolby commented 5 months ago

Hi sorry for late reply, for some reason I don't get automatic emails when issue is posted so just seeing this. This definitely looks like a bug, I mean the source having 0x0, this seems pretty indicative that something is not right. We will take a look and investigate.

Yousif-CS commented 5 months ago

I'll have a look. Thanks for reporting @akryzhanovskiy

Yousif-CS commented 5 months ago

Hey @akryzhanovskiy I was wondering, what was the sample rate you are using to send? We only support 44.1 and 48 kHZ. Also, this source being null is definitely a bug but should only affect you if you are not keeping a strong reference to the source whilst the track is alive, but it seems the source is alive throughout your whole example so no issues there.

akryzhanovskiy commented 5 months ago

@Yousif-CS thank you for your reply. Sample rate is 48 kHz. The main problem is that there is no sound from device when we publish stream.

In addition will add initialiser for MCAudioFrame that we use, maybe it will tell you something.

`extension MCAudioFrame {

convenience init?(audioSampleBuffer: CMSampleBuffer) {
    guard let audioStreamBasicDescription = CMSampleBufferGetFormatDescription(audioSampleBuffer)?.audioStreamBasicDescription else {
        return nil
    }

    guard let blockBuffer: CMBlockBuffer = CMSampleBufferGetDataBuffer(audioSampleBuffer) else {
        return nil
    }

    self.init()
    sampleRate = Int32(audioStreamBasicDescription.mSampleRate)
    bitsPerSample = Int32(audioStreamBasicDescription.mBitsPerChannel)
    channelNumber = Int(audioStreamBasicDescription.mChannelsPerFrame)
    frameNumber = Int(audioStreamBasicDescription.mFramesPerPacket)

    let numSamples = CMSampleBufferGetNumSamples(audioSampleBuffer)

    var dataPointer: UnsafeMutablePointer<Int8>?
    var dataLength: Int = 0
    let status = CMBlockBufferGetDataPointer(
        blockBuffer,
        atOffset: 0,
        lengthAtOffsetOut: nil,
        totalLengthOut: &dataLength,
        dataPointerOut: &dataPointer
    )
    guard status == kCMBlockBufferNoErr, dataLength > 0,
          let dataPointer else {
        return nil
    }

    let dstBytes = UnsafeMutablePointer<Float>.allocate(capacity: numSamples)

    let srcBytes = UnsafeRawPointer(dataPointer).bindMemory(to: Int16.self, capacity: numSamples)

    vDSP_vflt16(srcBytes, 1, dstBytes, 1, vDSP_Length(numSamples))

    //memcpy(dstBytes, dataPointer, dataLength)
    self.data = UnsafeRawPointer(dstBytes)
}

} `

Yousif-CS commented 5 months ago

Hey thanks @akryzhanovskiy for this. I actually just noticed you used an extension to MCAudioFrame that initializes from a CMSampleBuffer. We already have that implemented as MCCMSampleBufferFrame that takes in a CMSampleBuffer so I believe there is no need to implement your own. In fact I tested reading an mp4 file with AVAssetReader and feeding the produced sample buffers to the mentioned class and into the MCCustomAudioSource with no issues. Could you try that?

customAudioSource.onAudioFrame(MCCMSampleBufferFrame(sampleBuffer: yourSampleBuffer))
akryzhanovskiy commented 5 months ago

Hi @Yousif-CS , I've checked your suggestion, but result is the same - no sound. When we publish video and record it at the same time, then recorded video has sound but on the side where we stream there is no sound.

Yousif-CS commented 5 months ago

Hi @akryzhanovskiy thanks for trying this suggestion out. Is it possible to post the full description of the CMSampleBuffer you are feeding? something like from lldb or so. Also, just wondering, where you are capturing this audio from? Maybe I can attempt to create a simple test app to capture from the same source and see if I can reproduce on my end? Thanks!

akryzhanovskiy commented 4 months ago

Hi @Yousif-CS. Here is an example of CMSampleBuffer that we try to pass to custom audio source CMSampleBuffer 0x1311273a0 retainCount: 9 allocator: 0x1fef93eb0 invalid = NO dataReady = YES makeDataReadyCallback = 0x0 makeDataReadyRefcon = 0x0 formatDescription = <CMAudioFormatDescription 0x30112a260 [0x1fef93eb0]> { mediaType:'soun' mediaSubType:'lpcm' mediaSpecific: { ASBD: { mSampleRate: 48000.000000 mFormatID: 'lpcm' mFormatFlags: 0xc mBytesPerPacket: 2 mFramesPerPacket: 1 mBytesPerFrame: 2 mChannelsPerFrame: 1 mBitsPerChannel: 16 } cookie: {(null)} ACL: {Mono} FormatList Array: { Index: 0 ChannelLayoutTag: 0x640001 ASBD: { mSampleRate: 48000.000000 mFormatID: 'lpcm' mFormatFlags: 0xc mBytesPerPacket: 2 mFramesPerPacket: 1 mBytesPerFrame: 2 mChannelsPerFrame: 1 mBitsPerChannel: 16 }} } extensions: {(null)} } sbufToTrackReadiness = 0x0 numSamples = 1024 outputPTS = {16284436392/48000 = 339259.091, rounded}(based on outputPresentationTimeStamp) sampleTimingArray[1] = { {PTS = {16284436392/48000 = 339259.091, rounded}, DTS = {INVALID}, duration = {1/48000 = 0.000}}, } sampleSizeArray[1] = { sampleSize = 2, } dataBuffer = { CMBlockBuffer 0x30128e130 totalDataLength: 2048 retainCount: 1 allocator: 0x1fef93eb0 subBlockCapacity: 2 [0] 2048 bytes @ offset 128 Buffer Reference: CMBlockBuffer 0x30128e0a0 totalDataLength: 2308 retainCount: 1 allocator: 0x1fef93eb0 subBlockCapacity: 2 [0] 2308 bytes @ offset 0 Memory Block 0x102cf29c0, 2308 bytes (allocator 0x301b484e0) }

We are capturing video and audio with SCSDKCameraKit (https://docs.snap.com/camera-kit/integrate-sdk/mobile/ios) and using Snap filters. Here is configuration of AVCaptureSession

`var audioCaptureOutput: AVCaptureAudioDataOutput = .init() let videoCaptureOutput: AVCaptureVideoDataOutput = .init()

var videoCaptureDevice: AVCaptureDevice?

func configureSession() { captureSession.beginConfiguration() captureSession.inputs.forEach { captureSession.removeInput($0) } captureSession.outputs.forEach { captureSession.removeOutput($0) } settings.selectedAudioInput = AVAudioSession.sharedInstance().currentRoute.inputs.first?.portName

    AVAudioSession.sharedInstance().configureCategory()
    AVAudioSession.sharedInstance().configureIOBufferDuration()
    AVAudioSession.sharedInstance().printConfiguration()

    if let inputDevice = AVCaptureDevice.DiscoverySession(
        deviceTypes: [.builtInWideAngleCamera],
        mediaType: .video,
        position: position
    ).devices.first  {
        do {
            let captureInput = try AVCaptureDeviceInput(device: inputDevice)
            if captureSession.canAddInput(captureInput) {
                captureSession.addInput(captureInput)
                videoCaptureDevice = inputDevice
                position = inputDevice.position
                if inputDevice.position == .front {
                    self.horizontalFieldOfView = 50.0
                    try inputDevice.lockForConfiguration()
                    if inputDevice.isGeometricDistortionCorrectionSupported {
                        inputDevice.isGeometricDistortionCorrectionEnabled = true
                    }
                    inputDevice.unlockForConfiguration()
                } else if inputDevice.position == .back {
                    self.horizontalFieldOfView = 80.0
                    try inputDevice.lockForConfiguration()
                    if inputDevice.isGeometricDistortionCorrectionSupported {
                        inputDevice.isGeometricDistortionCorrectionEnabled = true
                    }
                    inputDevice.unlockForConfiguration()
                }
                settings.selectedCamera = .builtIn
            }
        } catch {
            print("[CameraKitInput] Unable to add captureVideoInput: \(error)")
        }
    }
    if let audioInputDevice = AVCaptureDevice.default(.builtInMicrophone, for: .audio, position: .unspecified) {
        do {
            let captureAudioInput = try AVCaptureDeviceInput(device: audioInputDevice)
            if captureSession.canAddInput(captureAudioInput) {
                captureSession.addInput(captureAudioInput)
            }
        } catch {
            print("[CameraKitInput] Unable to add captureAudioInput: \(error)")
        }
    }

    audioCaptureOutput = AVCaptureAudioDataOutput.init()
    if captureSession.canAddOutput(audioCaptureOutput) {
        captureSession.addOutput(audioCaptureOutput)
        audioCaptureOutput.setSampleBufferDelegate(self, queue: audioSampleBufferDelegateQueue)
    } else {
        print("[CameraKitInput] Unable to add audio capture output")
    }

    if captureSession.canAddOutput(videoCaptureOutput) {
        captureSession.addOutput(videoCaptureOutput)
        videoCaptureOutput.videoSettings = [
            String(describing: kCVPixelBufferPixelFormatTypeKey): NSNumber(value: kCVPixelFormatType_32BGRA),
            String(describing: kCVPixelBufferMetalCompatibilityKey): NSNumber(value: true)
        ]
        videoCaptureOutput.setSampleBufferDelegate(self, queue: videoSampleBufferDelegateQueue)
    } else {
        print("[CameraKitInput] Unable to add video capture output")
    }

    setCameraPosition(newPosition: position)

    captureSession.usesApplicationAudioSession = true
    captureSession.automaticallyConfiguresApplicationAudioSession = false
    captureSession.commitConfiguration()
}

//MARK: - AVCaptureAudioDataOutputSampleBufferDelegate, AVCaptureVideoDataOutputSampleBufferDelegate

extension CameraKitInput: AVCaptureAudioDataOutputSampleBufferDelegate, AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { guard isRunning else { return }

    if output == videoCaptureOutput {
        process(videoSampleBuffer: sampleBuffer)
    } else {
        process(audioSampleBuffer: sampleBuffer)
    }
}

}`

djova-dolby commented 4 months ago

Hi, sorry for the lack of results here, I will try to take a look in the coming days. I generally work on the c++ components of the SDK, so I may have some noob objc questions for you as I try to deduce the issue, but it is what it is. I will take a look at the ObjC/Swift SDK and trace out the path these buffers should be taking to see if anything jumps out as what could be causing the bug.

In the meantime could you please do the following:

Also one question about this statement:

When we publish video and record it at the same time, then recorded video has sound

This means that you record locally on the iphone then when you play back the clip you have expected audio? Or you are using our backend service to record and then watching back the recording? I am expecting it is the former, because if it were the latter it would be very strange. So just wanted to confirm.

akryzhanovskiy commented 4 months ago

Hi @djova-dolby, I will try to set logs. As for recording - we record video locally. We use AVAssetWriter to record the same video and audio that we pass to the stream. And I've updated SDK to the lates version, but result is the same.

djova-dolby commented 4 months ago

Yes of course, sorry for confusion the 1.8.4 SDK was just where I enabled logging from iOS SDK, it definitely won't fix the problem. The logs would essentially confirm for me that the failure is where I think it is and why. Looking through the SDK some, the issue is most likely that even when we are doing audio injection in the SDK we instantiate regular Webrtc iOS ADM which tries to initialize AVAudioSession, however you are already doing so with snapkit.

djova-dolby commented 4 months ago

Currently looking for quick/least-intrusive way to fix this for a 1.8.5 patch release. I don't have full iOS setup to test with locally, this will add some time to the debugging. But it is what it is, I will update once we have some confirmation/result/progress. And yes in the meantime please do share the logs once you capture them and I will have a look. Thanks!

akryzhanovskiy commented 4 months ago

Hi @djova-dolby. I've found another strange issue with sound. When we use MCAudioSource (not MCCustomAudioSource) and add audio input to AVCaptureSession, then stream for the first time - everything is fine. But when we stop stream, then stream again and during streaming we switch camera from front to back - audio is disappeared.

This is how we add audio input:

if let audioInputDevice = AVCaptureDevice.default(audioDeviceType, for: .audio, position: .unspecified) { do { let captureAudioInput = try AVCaptureDeviceInput(device: audioInputDevice) if captureSession.canAddInput(captureAudioInput) { captureSession.addInput(captureAudioInput) } } catch { print("[CameraKitInput] Unable to add captureAudioInput: \(error)") } }

And this is how we change camera:

`func setCameraPosition(newPosition: AVCaptureDevice.Position) { defer { captureSession.commitConfiguration() } captureSession.beginConfiguration()

    var inputDevice: AVCaptureDevice?

    switch newPosition {
    case .back, .front:
        if let device = AVCaptureDevice.DiscoverySession(
            deviceTypes: [.builtInWideAngleCamera],
            mediaType: .video,
            position: newPosition
        ).devices.first {
            inputDevice = device
        } else {
            inputDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: newPosition)
        }
        settings.selectedCamera = .builtIn
    @unknown default:
        inputDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front)
        settings.selectedCamera = .builtIn
    }

    if let inputDevice {
        do {
            let newVideoInput = try AVCaptureDeviceInput(device: inputDevice)
            if let currentInput = captureSession.inputs
                .first(where: { ($0 as? AVCaptureDeviceInput)?.device.hasMediaType(.video) == true }) {
                captureSession.removeInput(currentInput)
            }
            if captureSession.canAddInput(newVideoInput) {
                captureSession.addInput(newVideoInput)
                videoCaptureDevice = inputDevice
                if newPosition != .unspecified {
                    position = newPosition
                    horizontalFieldOfView = position == .front ? 50.0 : 80.0
                    try inputDevice.lockForConfiguration()
                    if inputDevice.isGeometricDistortionCorrectionSupported {
                        inputDevice.isGeometricDistortionCorrectionEnabled = true
                    }
                    inputDevice.unlockForConfiguration()
                }
            }
            if let connection = videoCaptureOutput.connection(with: .video) {
                print("[CameraKitInput] Orientation: \(connection.videoOrientation.title)")
                connection.videoOrientation = newPosition == .unspecified ? settings.externaCameraOrientation : .portrait
                connection.isVideoMirrored = newPosition == .front
            }
        } catch {
            print("[CameraKitInput] Unable to change camera: \(error)")
        }
    }
}

`

djova-dolby commented 4 months ago

Well, the audio/video capture implementations have not been fixed since we started so 1.5.2, and remained in techinical debt state. I will start looking into this, but it is looking more and more like it won't be a patchable fix. But we shall see. Can you open another ticket for your last comment? And thanks for reporting!

Yousif-CS commented 4 months ago

@djova-dolby @akryzhanovskiy All those issues seem to indicate a clash between webrtc internally modifying some AVCaptureSession or AVAudioSession with the application doing something similar. I am not sure this would be a simple fix in 1.8.5 (as a workaround) but it seems to indicate we need to redesign the Audio/Video device capture components in our SDK to make them interoperate with the application level's use of them.