guest271314 / AudioWorkletStream

fetch() => ReadableStream => AudioWorklet
https://guest271314.github.io/AudioWorkletStream/
25 stars 5 forks source link

live streaming wav is inconsistent / stops #1

Open benoitmercusot opened 3 years ago

benoitmercusot commented 3 years ago

Hi, first of all i would like to thank you for that POC / Script. Exactly what i was looking for. Amazing work. But I ran into an issue when i started to stream a "live stream". Actually raw audio stream from a node server (using node port audio). It works well for few seconds. (maybe a minute) then sounds is muted / gone. Do you think it's buffer related ? network issue ? For information i'm trying to broadcast audio without latency from node portaudio to the browser. I thought your script was a good way to start ! Thanks for your time. Benoit

guest271314 commented 3 years ago

What is the input audio stream format and what is logged at the console?

benoitmercusot commented 3 years ago

Hey. i'm using this https://www.npmjs.com/package/node-portaudio it's says raw audio but your wav codec works just fine. I can't log / find anything that show me why it drops :/ I was just wondering if it's related to the header of my stream. (no lenght...) so maybe it's a problem.

PS : working on the port-message branch.

guest271314 commented 3 years ago

If you have control over the input stream you can remove WAV header and use s16le to avoid remoal of the first 44 bytes. Is input 1 channel or 2 channel? What is the sampling rate?

benoitmercusot commented 3 years ago

Here is my input for now (using my iphone ear / mic for now). Works like a charm but stops. const ai = new portAudio.AudioInput({ channelCount: 2, sampleFormat: portAudio.SampleFormat16Bit, sampleRate: 44100, deviceId : -1 // Use -1 or omit the deviceId to select the default device });

Tried other branches of your repo :) Wasm memory hearing seems to be limited by the duration of the memory allowed. Not sure to understand this "and use s16le to avoid remoal of the first 44 bytes" but very nice of you to answer.

guest271314 commented 3 years ago

According to https://www.npmjs.com/package/node-portaudio

// Note that this does not strip the WAV header so a click will be heard at the beginning
const rs = fs.createReadStream('steam_48000.wav');

Can you upload the WAV file here so that we can test using the same code?

guest271314 commented 3 years ago

Not sure to understand this "and use s16le to avoid remoal of the first 44 bytes" but very nice of you to answer.

See http://www.topherlee.com/software/pcm-tut-wavformat.html, https://github.com/guest271314/AudioWorkletStream/blob/message-port-post-message/audioWorklet.js#L12.

benoitmercusot commented 3 years ago

well it's actually not a file but a live stream from my mic. actually if you are interested and have time i can MP you a link ?

benoitmercusot commented 3 years ago

Not sure to understand this "and use s16le to avoid remoal of the first 44 bytes" but very nice of you to answer.

See http://www.topherlee.com/software/pcm-tut-wavformat.html, https://github.com/guest271314/AudioWorkletStream/blob/message-port-post-message/audioWorklet.js#L12.

Interesting. Thanks. I actually removed this part // accumulate 344 512 1.5 of data (to achieve real time, maybe that's what causing latency)

guest271314 commented 3 years ago

You should be able to record the microphone output per the NPM documentation. To capture microphone directly see also https://github.com/guest271314/SpeechSynthesisRecorder/issues/17#issuecomment-749875748, https://github.com/guest271314/setUserMediaAudioSource, https://github.com/guest271314/captureSystemAudio.

I actually removed this part // accumulate 344 512 1.5 of data (to achieve real time, maybe that's what causing latency)

This is capable of streaming without waiting for accumulation of data https://github.com/guest271314/webtransport/blob/main/webTransportAudioWorkletWebAssemblyMemoryGrow.js.

benoitmercusot commented 3 years ago

Actually i'm not trying to record but to stream :) (server to browser, ie soundcard input (node server) to browser (client) ) your last link looks very nice, you think i can use it with an endless wav/raw stream ?

guest271314 commented 3 years ago

It is difficult to test and verify "endless" https://bugs.chromium.org/p/chromium/issues/detail?id=1161429. During testing I streamed 8 hours of audio yesterday.

Capturing entire system output or specific application audio output is possible by creating a virtual microphone and setting the source to an application or user-defined stream, at the browser capturing with navigator.mediaDevice.getUserMedia({audio: true}) avoids the need to read and write bytes individually, after configuration the task can be reduced to HTMLMediaElement.srcObject = mediaStream.

The WebAssemblyMemory.grow() version of the code in this repository has implementation restrictions at Chrome which differ for 32-bit and 64-bit systems, see https://github.com/wasmerio/wasmer-php/issues/121, according to the Chrome issue 4GB fo4 64-bit, I have not yet tested the maximum. A ring buffer could be written which overwrites the previously used indexes of the SharedArrayBuffer, that is a TODO.

benoitmercusot commented 3 years ago

Thanks you very much for all the explaination. Yes navigator.mediaDevice.getUserMedia({audio: true}) but not working for what i want since i want the server to be the source :/ I'll take a look at all your links. All this is still very complicated and confused for me ! 🤪 You look way beyond everyone on the internet regarding this specifics API !! Have a good evening. Benoit.

guest271314 commented 3 years ago

For an "infinite" or "endless" audio stream I would try using WebTransport for the ReadableStream to avoid restritions on ServiceWorker approach, in that case the server handles quic-transport protocol, see https://github.com/guest271314/webtransport/. opus_stream_sw.zip

benoitmercusot commented 3 years ago

Thanks a ton. i'll look into that ! Benoit

guest271314 commented 3 years ago

Is this issue resolved?

benoitmercusot commented 3 years ago

héhé. i'm not that fast. i have to understand all the sources you gave me :)

benoitmercusot commented 3 years ago

HI @guest271314 i was reading this whole thread. looks like exaclty what i was trying to achieve : https://github.com/wasmerio/wasmer-php/issues/121 (except using node instead of php passthru) did you make anyprogress on this ? regarding memory grow / duration ? Thank you; Benoit.

guest271314 commented 3 years ago

The Native Messaging, PHP passthru() version is essentially the precursor to the QuicTransport and WebTransport versions. Since you read that Issue you noted the Chrome bug which limits WebAssembly.Memory.grow() to 4GB on 64-bit system. When I was testing that code I was on a 32-bit system. I have not yet tried to reach the maximum on 64-bit. It should be possible to use more than one Memory or SharedArrayBuffer or Typed Array instance, and, or, overwrite the indexes that have already been parsed, using a "circular buffer" approach. I suggest testing the maximum on your system.

Craeting a virtual microphone device and using MediaStream eliminates the need to do that (count bytes). However, it is edifying to be able to achieve either approach.

benoitmercusot commented 3 years ago

Hi @guest271314 i'm now having fun with your MessagePort.postMessage() branch. From what i understand the time limit should be limited by the Uint8Array size of the AudioWorkletProcessor. I still have inconsistency in the playback (stops occurs after few seconds, sometimes a minute) but i suspect my wav stream to be to inconsistent (too big ?). the appendBuffers log shows huge variation in the array length so there must be a issue here. Still digging ! Benoit.

benoitmercusot commented 3 years ago

Think i found a hack (ugly ?) issues indeed occurs when index was lower than offset

// magic "if" hack 😅

if( this.offset < this.index ){

  for (let i = 0; i < 512; i++, this.offset++) {
    if (this.offset === this.uint8.length) {
      console.log(this.uint8);
      break;
    }
    uint8[i] = this.uint8[this.offset];
  }
  const uint16 = new Uint16Array(uint8.buffer);
  CODECS.get(this.codec)(uint16, channels);

}

guest271314 commented 3 years ago

The offset is the bytes read, the index is the bytes written.

well it's actually not a file but a live stream from my mic.

One solution

navigator.mediaDevices.getUserMedia({audio: true})
.then(stream => {
  // do stuff with stream: MediaStream
});
benoitmercusot commented 3 years ago

Hi ! My "hack" works like a charm. Before that it actually stopped every time index was beyond offset (i suppose it make sense). Now it never stops. MediaStream was not a solution because final use is to stream from input card from another device. For now everything works as expected ! Thanks a lot for all your "WIPs"

benoitmercusot commented 3 years ago

You should be able to record the microphone output per the NPM documentation. To capture microphone directly see also guest271314/SpeechSynthesisRecorder#17 (comment), https://github.com/guest271314/setUserMediaAudioSource, https://github.com/guest271314/captureSystemAudio.

I actually removed this part // accumulate 344 512 1.5 of data (to achieve real time, maybe that's what causing latency)

This is capable of streaming without waiting for accumulation of data https://github.com/guest271314/webtransport/blob/main/webTransportAudioWorkletWebAssemblyMemoryGrow.js.

HI there ! If you have one minute to show me how to implement this ? (ie where that text /bytes input comes from ?)

guest271314 commented 3 years ago

For the WebTransport version "text" input originates in the browser at function call

webTransportAudioWorkletMemoryGrow('hello world')

Sending to quic-transport URL

    let data = encoder.encode(text);
    await writer.write(data);
    console.log('writer close', await writer.close());

input_data here https://github.com/guest271314/webtransport/blob/main/quic_transport_server_tts.py#L138 is the same text in the process

data = subprocess.run(['./tts.sh', input_data], stdout=subprocess.PIPE)

"$1" is the input text passed in this case to espeak-ng https://github.com/guest271314/webtransport/blob/main/tts.sh#L2

espeak-ng -m --stdout "$1" # TODO pass, set options

the response is payload https://github.com/guest271314/webtransport/blob/main/quic_transport_server_tts.py#L140

self.connection.send_stream_data(response_id, payload, True)

the ReadableStream from quic-transort server https://github.com/guest271314/webtransport/blob/main/webTransportAudioWorkletWebAssemblyMemoryGrow.js#L183

const { readable } = stream;

that we pipeTo() https://github.com/guest271314/webtransport/blob/main/webTransportAudioWorkletWebAssemblyMemoryGrow.js#L184 a WrtitableStream(), in this case grow() a WebAssembly.Memory (SharedArrayBuffer) instance if necessary (see https://github.com/guest271314/webtransport/blob/main/webTransportAudioWorkletWebAssemblyMemoryGrow.js#L8 for minimum, maxiumum values corresponding to audio, feel free to verify the values used, as there was no manual for how to accurate dervice those values, I learned Python and bytes necessary per second by doing).

guest271314 commented 3 years ago

To install aioquic

python3 -m pip install aioquic

create the necessary certificates https://github.com/guest271314/webtransport/blob/main/quic_transport_server_tts.py#L40, launch Chrome or Chromium with the appropriate flags found in the same comment block.

Note, I commented, do not use ALLOWED_ORIGINS https://github.com/guest271314/webtransport/blob/main/quic_transport_server_tts.py#L94 for the ability to run the code at console on any site.

benoitmercusot commented 3 years ago

awesome, thanks a lot.

guest271314 commented 3 years ago

Relevant to running the code at console at any origin, there is still a restriction on doing so using AudioWorklet due to the design being an Ecmascript Module, thus GitHub blocks loading. Once you get the code running and test at console on this very page you will perhaps gather why I filed https://github.com/WebAudio/web-audio-api-v2/issues/109.