mmig / libflac.js

FLAC encoder and decoder in JavaScript
Other
93 stars 22 forks source link

decoding? #1

Open tyler-g opened 8 years ago

tyler-g commented 8 years ago

@russaa thank you for this great emscripten port.

I've been working on a fork to add chunk decoding ability, and running into some issues. I was able to get FLAC__stream_decoder_init_stream to successfully init (return code 0), but the next step is where I'm struggling.

On the encoding side, when you call FLAC__stream_encoder_process_interleaved you pass it the data to encode , but the decoding side is a bit different: libflac uses this function to process a decode buffer:

FLAC__stream_decoder_process_single

Except you don't pass it data directly, you instead pass it a pointer to a read callback. Once that read callback is called, one of the callback parameters is buffer (a pointer to the location where the data will go) and byte ( a pointer to the max size of bytes that can be written to memory for that chunk. libFlac docs say this pointer must be updated to the actual number of bytes the chunk is (since in flac the chunk size varies).

Basically what I've done is create a new worker for decoding and added decode function references to the libflac.js. The stream decode init is working properly, but I cannot get the FLAC__stream_decoder_process_single to work properly, and I'm almost sure it's the way I'm handling the pointers returning in the read callback function.

I've also described the issue a bit in this stackoverflow : http://stackoverflow.com/questions/40311276/decoding-a-single-chunk-of-flac-data-in-javascript-using-libflac-js

Would be great to hear your thoughts on this, thanks!

russaa commented 8 years ago

I am happy that you found libflac.js useful.

I will help as much as I can, but to be honest, we only ever used the decoding functionality and most of the wrapping code was done by my collaborator.

Would it be possible, to share some more context? Without the code in question, I can only guess at what could have gone wrong.

If you suspect the pointer, did you have a look at how these are handled in emscripten? e.g. https://kripken.github.io/emscripten-site/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#access-memory-from-javascript

tyler-g commented 8 years ago

thanks @russaa let me see if I can clean up some of my code tomorrow and then fork the codebase

tyler-g commented 7 years ago

@russaa I put a fork up here: https://github.com/tyler-g/libflac.js

It's not working yet but I tried to show the basic idea – added some incomplete decoding docs to the README. Added some wrappers to decoding functions in the libflac.js file (note the minified version is still the unforked version for now).

Basically, the init of the decoding stream is working, but once it starts processing data, it is hitting the read_callback_fn function with I believe a pointer to where the chunk should be stored – however I haven't yet figured out how to store it from there.

russaa commented 7 years ago

OK @tyler-g I will have a look -- but I probably won't come around to it before next week.

Could you do me a favor: could you make your changes (in your fork at https://github.com/tyler-g/libflac.js) on a separate branch, i.e. not master? e.g. something like issue-1_decoding

That would make collaborating easier.

tyler-g commented 7 years ago

Sure @russaa , will do that before the end of the week. Thanks for looking

tyler-g commented 7 years ago

ok, I restored master to original state and put my changes on branch decode

russaa commented 7 years ago

OK, this is not a complete solution, but maybe it will point you in the right direction:

in libflac.js, the init_decoder_stream() function should probaly look more like something along the following:

//...
init_decoder_stream: function(decoder, read_callback_fn, error_callback_fn, client_data){

  client_data = client_data|0;

  //TODO move these out of this function?
  // FLAC__STREAM_DECODER_READ_STATUS_CONTINUE     The read was OK and decoding can continue.
  // FLAC__STREAM_DECODER_READ_STATUS_END_OF_STREAM   The read was attempted while at the end of the stream. Note that the client must only return this value when the read callback was called when already at the end of the stream. Otherwise, if the read itself moves to the end of the stream, the client should still return the data and FLAC__STREAM_DECODER_READ_STATUS_CONTINUE, and then on the next read callback it should return FLAC__STREAM_DECODER_READ_STATUS_END_OF_STREAM with a byte count of 0.
  // FLAC__STREAM_DECODER_READ_STATUS_ABORT       An unrecoverable error occurred. The decoder will return from the process call.
  var FLAC__STREAM_DECODER_READ_STATUS_CONTINUE = 0;
  var FLAC__STREAM_DECODER_READ_STATUS_END_OF_STREAM = 1;
  var FLAC__STREAM_DECODER_READ_STATUS_ABORT = 2;

  //(const FLAC__StreamDecoder *decoder, FLAC__byte buffer[], size_t *bytes, void *client_data)
  var read_callback_fn_ptr = Runtime.addFunction(function(p_decoder, buffer, bytes, p_client_data){
    //FLAC__StreamDecoderReadCallback, see https://xiph.org/flac/api/group__flac__stream__decoder.html#ga7a5f593b9bc2d163884348b48c4285fd

    var len = Module.getValue(bytes, 'i32');//FIXME which type has bytes (size_t)? 'i16'?

    if(len === 0){
      return FLAC__STREAM_DECODER_READ_STATUS_ABORT;//FIXME need to use number or declare const-value!!
    }

    //callback must return object with: {buffer: ArrayBuffer, readDataLength: number, error: boolean}
    var readResult = read_callback_fn(len, p_client_data);
    //in case of END_OF_STREAM or an error, readResult.readDataLength must be returned with 0

    var readLen = readResult.readDataLength;
    Module.setValue(bytes, readLen, 'i32');//FIXME which type has bytes (size_t)? 'i16'?

    if(readResult.error){
      return FLAC__STREAM_DECODER_READ_STATUS_ABORT;//FIXME need to use number or declare const-value!!
    }

    if(readLen === 0){
      return FLAC__STREAM_DECODER_READ_STATUS_END_OF_STREAM;//FIXME need to use number or declare const-value!!
    }

    var readBuf = readResult.buffer;
    Module.HEAPU8.set(readBuf, buffer);//FIXME is this correct for transfering the read data to the buffer?

    return FLAC__STREAM_DECODER_READ_STATUS_CONTINUE;//FIXME need to use number or declare const-value!!
  });

  var error_callback_fn_ptr = Runtime.addFunction(function(p_decoder, err, p_client_data){

    //err:
    // FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC         An error in the stream caused the decoder to lose synchronization.
    // FLAC__STREAM_DECODER_ERROR_STATUS_BAD_HEADER       The decoder encountered a corrupted frame header.
    // FLAC__STREAM_DECODER_ERROR_STATUS_FRAME_CRC_MISMATCH   The frame's data did not match the CRC in the footer.
    // FLAC__STREAM_DECODER_ERROR_STATUS_UNPARSEABLE_STREAM   The decoder encountered reserved fields in use in the stream.
    var msg;
    switch(err){
    case 0:
      msg = 'FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC';
      break;
    case 1:
      msg = 'FLAC__STREAM_DECODER_ERROR_STATUS_BAD_HEADER';
      break;
    case 2:
      msg = 'FLAC__STREAM_DECODER_ERROR_STATUS_FRAME_CRC_MISMATCH';
      break;
    case 3:
      msg = 'FLAC__STREAM_DECODER_ERROR_STATUS_UNPARSEABLE_STREAM';
      break;
    default:
      msg = 'FLAC__STREAM_DECODER_ERROR__UNKNOWN';//this should never happen
    }

    //TODO convert err? add/remove string representation for err code?
    error_callback_fn(err, msg, p_client_data);
  });

  //(const FLAC__StreamDecoder *decoder, const FLAC__Frame *frame, const FLAC__int32 *const buffer[], void *client_data)
  var write_callback_fn_ptr = Runtime.addFunction(function(p_decoder, p_frame, p_buffer, p_client_data){
    //TODO create typed array and store frames/buffer into it, then give feed it into the callback write_callback_fn
    write_callback_fn();//p_frame, p_buffer, p_client_data);

    //TODO return:
    // FLAC__STREAM_DECODER_WRITE_STATUS_CONTINUE   The write was OK and decoding can continue.
    // FLAC__STREAM_DECODER_WRITE_STATUS_ABORT     An unrecoverable error occurred. The decoder will return from the process call.
  });

  var init_status = Module.ccall('FLAC__stream_decoder_init_stream', 'number', ['number', 'number', 'number', 'number', 'number', 'number', 'number', 'number', 'number', 'number'], [decoder, read_callback_fn_ptr, 0, 0, 0, 0, write_callback_fn_ptr, 0, error_callback_fn_ptr, client_data]);

  return init_status;
},
//...

_(note that the write_callback_fn_ptr callback is not actually implemented here)_

... and with this, as an example, the callback read_callback_fn() would look something like this (note the changed signature for the callback):

//...
var isTested = false;
function read_callback_fn(bufferSize){

  if(isTested/* is at end of input stream, i.e. nothing to read any more */){
    return {buffer: null, readDataLength: 0, error: false};
  }

  isTested = true;

  var _buffer = new ArrayBuffer(bufferSize);
  var numberOfReadBytes;
  try{
    //read data from some source into _buffer
    new DataView(_buffer).setUint8(0, 101);//TEST set some value at start
    new DataView(_buffer).setUint8(bufferSize-3, 85);//TEST set some value near the end

    // ...and store number of read bytes into var numberOfReadBytes (i.e. length of read data with regard to an UINT8-view on the ArrayBuffer):
    numberOfReadBytes = bufferSize-2;//TEST set the read-data-length to the last written value, see above

  } catch(err){
    console.error(err);//DEBUG
    return {buffer: null, readDataLength: 0, error: true};
  }

  return {buffer: _buffer, readDataLength: numberOfReadBytes, error: false};
}
//...

... the isTested variable, and writing via DataView is included just for testing, that would needed to be replaced by the actual data-reading

Then, if you use `` for decoding frame-by-frame, the invocation should probaly put in a loop, and you would need to check the decoder-state, in order to decide, when decoding has finished, e.g. the state is END_OF_STREAM:

//...
var continue  = true, state;
while(continue){
  flac_return = Flac.decode_buffer_flac_as_pcm(flac_decoder);
  if (flac_return != true){
    console.log("Error: decode_buffer_flac_as_pcm returned false. " + flac_return);
    continue = false;
  } else {
   state = Flac.stream_decoder_get_state(flac_decoder);//TODO impl. & export this function
   if(state === Flac.STREAM_DECODER_END_OF_STREAM){//TODO declare & export the decoder state-constants
      continue = false;//should also stop, for some other states, e.g. aborted
   }
  }
}
//...

Let me know if this helps you, or not. Feel free to ask again, if you get stuck again.

And when you complete the decoding-functionality, it would be great, if you would make a pull request (if that's OK with you).

tyler-g commented 7 years ago

@russaa much thanks for this. I'm working through understanding it and making some changes on my local branch. There's one thing I don't get though: where am I sending the actual chunk of FLAC-encoded data to Flac.decode_buffer_flac_as_pcm ?

Maybe I'm not understanding it right. I would think if I am calling the decode, I should be passing it a chunk of data to decode – but it looks like it doesn't work like that.

Right now I'm getting stuck crashing the tab, in the while(continue) loop, but thats because I still need to properly implement and export Flac.stream_decoder_get_state properly.

russaa commented 7 years ago

Maybe I'm not understanding it right. I would think if I am calling the decode, I should be passing it a chunk of data to decode – but it looks like it doesn't work like that.

that is done by the read-callback: the flac encoder "request" the data that it needs for decoding ... by passing in a buffer/array that should be filled with the data that will be decoded -- see the read_callback_fn(bufferSize) example above (which fills a buffer with some dummy data).

Also, in this read_callback_fn(bufferSize) you need to take care, that it "notifies" the decoder, when the stream is ended -- in the example above that is done in the dummy code by isTested: this ensures that the first time the read-callback is invoked, it returns some data and on the second time it "notifies" the encoder that there is no more data to be read.

For testing purposes you could use some external boolean-variable that is set in the read_callback_fn callback when there is no more data, and which is then used in your while-loop in order to determine when it should stop (so that your tab/browser won't crash) ... or just include a "safety check" to stop the loop after 100 or so repeats.

Also: if you would use the FLAC__stream_decoder_process_until_end_of_stream instead of FLAC__stream_decoder_process_single (in your Flac.decode_buffer_flac_as_pcm implementation), you probably would not need the while-loop and Flac.stream_decoder_get_state.

tyler-g commented 7 years ago

ok I think I got that part. I implemented the Flac.stream_decoder_get_state in libflac, and this is correctly returning 4 (end of stream). I'm getting to the read_callback_fn for the first few bits of flac data that gets sent, but after that I'm not hitting it. Hm.

tyler-g commented 7 years ago

@russaa I don't know if this helps but I'm getting a FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC after the first few times the read_callback_fn is called. After that it never hits read_callback_fn, though the state is still returning 4 in the while loop

russaa commented 7 years ago

Without seeing the actual code, I could only guess what may be the problem. For understanding & finding problems, I think, it would be easier to look at the concrete code.

So I have pulled your encode-branch and created another one from that for my changes (from my post above): the encode_experimental branch.

You could pull my branch and merge it into your decode-branch and then add your changes (up to where you have the problems you describe above) -- then I could look at the details.

tyler-g commented 7 years ago

Ok, I merged in your experimental branch and then made some changes on the decode branch.

I seem to be getting a little farther now.. now I can successfully hit the write callback.

First I got rid of the while loop in the read callback, because instead what I'm doing is hitting flac_return = Flac.decode_buffer_flac_as_pcm(flac_decoder) every time a chunk of FLAC data comes over the socket. Right before I do that, I set the current_chunk variable to the UInt8Array of the current chunk of flac data that came through. This is then used in the read callback.

In the read callback I got rid of the isTested stuff. So, basically at this point the flow is (assuming decoder init has already successfully happened):

  1. chunk of encoded FLAC data comes through socket
  2. set the current_chunk to that chunk, to be used later in read callback
  3. call flac_return = Flac.decode_buffer_flac_as_pcm(flac_decoder)
  4. read_callback_fn_ptr (in libflac.js) is called. This then calls read_callback_fn , passing the max # of bytes it can accept to be read/decoded.
  5. read_callback_fn gets the max # of bytes value. It then returns an object to read_callback_fn_ptr containing the 1: ArrayBuffer of current_chunk (the UInt8Array which was saved previously), 2: the actual number of bytes (byteLength ?) of this ArrayBuffer.
  6. read_callback_fn_ptr gets this info and sets the bytes pointer to the actual # of bytes, and sets the buffer pointer to the actual read buffer data. Here I had to changed your Module.HEAPU8.set(readBuf, buffer); to: var dataHeap = new Uint8Array(Module.HEAPU8.buffer, buffer, readLen); dataHeap.set(new Uint8Array(readBuf)); Without this change, I wasn't hitting the write callback.
  7. Hitting the write callback, but instead of decoded data, I'm seeing just a number 5273872. I believe this is because of how i'm accessing the pointer inside the write callback in libflac.js (write_callback_fn_ptr).

Feel like we're getting close here though..

russaa commented 7 years ago

sounds good :-)

If you want me to take a look at the write-callback, you should publish your changes to githup (i.e. push the new branch to your repo)

tyler-g commented 7 years ago

hmm you don't see it? :O it's the decode branch on my fork

russaa commented 7 years ago

oh, sorry -- I somehow automatically assumed that you would create a new branch and did not even think to look in decode for the changes.

OK, I will have a look at the write-callback then.

tyler-g commented 7 years ago

No worries, you'll probably catch something i'm missing

russaa commented 7 years ago

just a heads up: it probably will take me a bit, until I get to work on it, since I am pretty swamped at the moment with other important work/projects.

But I should get around to it, some time next week

tyler-g commented 7 years ago

hey russaa, just checking up to see if you had a chance to look at this. Hope you had a nice thanksgiving.

russaa commented 7 years ago

hi @tyler-g, since I am in Germany, we do not really have thanksgiving here ;-) but thank you for the thought

I hope you did have a nice holiday

I wanted to write a little status update anyway, so you won't wonder, why I did not post any progress report yet, but you beat me to it: I did not forget libflac, but as I mentioned, I am currently involved in a work project and it turned out to be even more demanding and rather time-consuming than I initially thought. But the deadline for this project is next week, so I will look into our libflac problem then.

tyler-g commented 7 years ago

ok no problem, just let me know when you have a chance to look !

russaa commented 7 years ago

OK, I continued working on the write-callback (it is not fully working yet):

in directory example/ of the decode_experimental branch is a little testing page that takes a FLAC file as input and triggers a download for the decoded data.

As of yet, decoding does not fully work yet -- probably the decoded data is not handled correctly yet. But the decoding runs through a file that is dropped/loaded, and some data is decoded and saved (just not real PCM/WAV data yet).

Maybe you can already make something useful with this?

The files in example/:

encode-func.js: the code that initializes the decoder and returns/stores the decoded data (NOTE that I added & use the function `decode_stream_flac_as_pcm` to libflac.js; it invokes `FLAC__stream_decoder_process_until_end_of_stream` instead of `FLAC__stream_decoder_process_single` on the decoder.

libflac.js: the modified libflac library file (NOTE this is different form the one in `dist/`)

file-handler.js: handles loading of the FLAC file (and triggers decoding & downloading)
data-util.js: some basic utility functions for handling the typed data (would probably need to convert the raw Uint8 data to PCM & add WAV header)
tyler-g commented 7 years ago

awesome, thanks. I'm going to take a look through this this week. Will update

tyler-g commented 7 years ago

ok I looked over this a bit yesterday, and everything looks like you're getting the FLAC decoder to decode data. The write callback is working great.

I thought maybe the problem was you weren't converting the decoded buffers to PCM data... or are they already supposed to be PCM data?

I tried converting them to Float and then using a FloatTo16BitWav conversion; but the results seem to be the same: the output WAV data (open it in Audacity for example), is distorted and all at max levels ... be careful if listening to it with headphones !

The conversion process seems to have no errors, so I'm not sure what we're doing wrong

tyler-g commented 7 years ago

@russaa hey just wanted to check in. Don't want to give up on this!

russaa commented 7 years ago

yeah, I would like to get it to work, too. I haven't looked at it since last, but I will probably have another go at it this week or next week.

I will post an update then.

russaa commented 7 years ago

OK, I have pushed an update:

it is almost working as it should .... but only with the help of a pretty dirty HACK: see function __fix_write_buffer() which is used in the write-callback-handler, i.e. in write_callback_fn_ptr() in libflac.js

For some reason, byte values 0 (i.e. the minimal value) and 255 (i.e. the maximal value) get "triclicated" (i.e. three values instead of one) or falsely inserted as a double value (i.e. two values, where there should be none). If the data is cleaned "manually" (i.e. in __fix_write_buffer()), it is almost as is should be -- but of course it would be better to find the reason, why the data is so weird and fix that ... and the almost part should get fixed too: there are some smaller data chunks that are not as they should be. Although these wrong data seem to be too small to notice in the audio: at least I could not here any noticeable glitches in the audio which I used for testing.

tyler-g commented 7 years ago

sorry for not responding sooner. I've been a bit busy lately.. going to take a look at your work as soon as I can. Likely next week

russaa commented 7 years ago

no problem -- I've been busy too, since then.

I hope it is OK with you: I took our work on the decoding-functionality and integrated it into the main branch (& added examples/ sample code for encoding & decoding files).

Note that I changed some function names, but it should all be reflected in the sample code (see README.md and examples/ directory).

However, I did not find the reason for the weird decoded data that I mentioned (multiples of 255s and 0s, where there should be only one); but as far as I can determine, the workaround/hack for eliminating the "unwanted" data works fine ... it "just" introduces some additional workload (CPU & working memory) during decoding.

The other thing I noticed, was, that the data, that is produced by the JavaScript decoder is not completely the same as when using e.g. the C-based decoder: there are some, rather small sections that defer from each other. But I could not determine any obvious patterns or reasons for this (maybe it is related to the weird multiplied occurrence of 255- and 0-byte values in the decoded data).

Let me know, if you have any remarks or questions.

tyler-g commented 7 years ago

Hey @russaa , trying to integrate your changes into my project, but I can't figure out one thing. In your decode example you have a FLAC file, and you have the total byteLength of that data (as in https://github.com/mmig/libflac.js/blob/master/example/decode-func.js). However in my project I'm streaming chunks of flac data, so I'm just a little unsure of how to init the decode stream without having all the file data upfront.

russaa commented 7 years ago

Hey @tyler-g, the decode-func.js contains 2 variants for retrieving FLAC data that will be decoded:

see variable isDecodePartial: false requires the FLAC data to be completely present (invoking decode_stream_flac_as_pcm()), whereas with true the decoding function (decode_buffer_flac_as_pcm()) needs to be called continually for decoding one chunk/frame at a time.

So for a continuous input stream, the variant with isDecodePartial = true needs to be used.

As for the byteSize:
this is only used in the in the read_callback_fn for determining, how much data can be returned at most.

If there is a continuous data source, the read-callback would need to be modified -- and probably become a bit more complicated. As far as I can see, there 3 more things to take care of:

  1. The input stream needs to get buffered (e.g. store the data-chunks in an array of byte-arrays).
  2. In the read_callback_fn use the oldest buffered data-chunk (i.e. byte-array) for reading, and when reaching its end, discard this buffered data-chunk (i.e. remove from input buffer) and start with the next entry in the buffer-array. Note that read_callback_fn must not return 0 as readDataLength until the end of the input stream is reached, i.e. the continuous input stream is stopped/closed, because a value of 0 here signifies the end-of-stream to the decoder.
    However, it is no problem to return less than the requested (maximal) bufferSize. E.g. you could use the byteSize of the current buffered data-chunk until less than the requested bufferSize is returned, and then set the next buffered data-chunk as current data-chunk, so that with the next invocation of read_callback_fn the next buffered data-chunk will be used for reading -- something like:

    function read_callback_fn(bufferSize){
    
      ...
    
      if(numberOfReadBytes < bufferSize){
          //use next buffered data-chunk for decoding:
          if(bufferedInputData.length > 0){
            //get first entry from input-buffer (assuming new data-chunks are push()'ed)
            binData = bufferedInputData.shift();
            size = binData.buffer.byteLength;
            currentDataOffset = 0;
          } else {
            //TODO pause decoding until (buffered) data is available again
          }
      }
      return {buffer: _buffer, readDataLength: numberOfReadBytes, error: false};
    }
  3. If there is no next data chunk (buffered byte-array), pause decoding (i.e. calling decode_buffer_flac_as_pcm()) until there is data available again.
    E.g. by extracting the while-loop for decoding into a separate function and then use a pause-flag in the while-loop, and restart the loop when input data is available again.
    (actually, the whole decoding process probably needs to get refactored into separate functions for initializing, decoding, continue-decoding and finish-decoding)
tyler-g commented 7 years ago

thanks @russaa , wrapping my head around it this week

tyler-g commented 7 years ago

I'm having a little trouble getting this in the context of my project. If the below code is too hard to read, I can paste it in a separate gist.

Here's the general idea of how my decoder file (below) works. You can ignore the WAV function stuff.

  1. Commands come in, such as init or decode
  2. If init, it calls Flac.init_decoder_stream to set the decoder
  3. If decode it pushes the e.data.buf (which is aUInt8Array) into an Array.
  4. read_callback_fn reads the data that was pushed into the array. Here is where I think I have the function messed up. I think I have too much in there - but I was trying to start from the example you provided.
importScripts('libflac.js');

var flac_decoder,
BUFSIZE = 4096,
CHANNELS = 1,
SAMPLERATE = 44100,
COMPRESSION = 5,
BPS = 16,
VERIFY = true,
flac_ok = 1,
current_chunk,
num_chunks = 0,
meta_data,
currentDataOffset = 0,
decData = [],
bufferedInputData = [];

var TEST_MAX = 10000;//FIXME TEST: for safety check for testing -> avoid infinite loop by breaking at max. repeats
var TEST_COUNT = 0;//FIXME TEST

function read_callback_fn(bufferSize){
    console.log('decode read callback, buffer bytes max=', bufferSize);

    //safety check for testing: avoid infinite loop by breaking at max. repeats
    if(++TEST_COUNT > TEST_MAX){
        return {buffer: null, readDataLength: 0, error: false};
    }

    var size = bufferedInputData[bufferedInputData.length - 1].buffer.byteLength;

    var start = currentDataOffset;
    var end = currentDataOffset === size? -1 : Math.min(currentDataOffset + bufferSize, size);

    var _buffer;
    var numberOfReadBytes;
    if(end !== -1){

        _buffer = bufferedInputData[bufferedInputData.length - 1].subarray(currentDataOffset, end);
        numberOfReadBytes = end - currentDataOffset;

        currentDataOffset = end;
    } else {
        numberOfReadBytes = 0;
    }

    if (numberOfReadBytes < bufferSize) {
        //use next buffered data-chunk for decoding:
        if (bufferedInputData.length > 0) {
          //get first entry from input-buffer (assuming new data-chunks are push()'ed)
          binData = bufferedInputData.shift();
          size = binData.buffer.byteLength;
          currentDataOffset = 0;
        } else {
          //TODO pause decoding until (buffered) data is available again
        }
    }

    return {buffer: _buffer, readDataLength: numberOfReadBytes, error: false};
}

function write_callback_fn(decoder, buffer){
    // buffer is the decoded audio data, Uint8Array
    console.log('decode write callback', buffer);
    decData.push(buffer);
}

function metadata_callback_fn(data){
    console.info('meta data: ', data);
    meta_data = data;
}

function error_callback_fn(decoder, err, client_data){
    console.log('decode error callback', err);
    //Flac.FLAC__stream_decoder_finish(decoder);
}

self.onmessage = function(e) {

    switch (e.data.cmd) {

    case 'init':
        // using FLAC
        console.log('calling init_libflac_decoder');
        flac_decoder = Flac.init_libflac_decoder(SAMPLERATE, CHANNELS, BPS, COMPRESSION, 0);
        ////
        if (flac_decoder != 0){
            var status_decoder = Flac.init_decoder_stream(flac_decoder, read_callback_fn, write_callback_fn, error_callback_fn, metadata_callback_fn);
            flac_ok &= (status_decoder == 0);

            console.log("flac decode init     : " + flac_ok);//DEBUG
            console.log("status decoder: " + status_decoder);//DEBUG

            INIT = true;
        } else {
            console.error("Error initializing the decoder.");
        }

        break;

    case 'decode':

        console.log('case decode', e.data);

        // e.data.buf is the chunk we need to decode, and it must sit until FLAC calls the read callback, after which we must set the data into the supplied buffer and set the bytes pointer
        bufferedInputData.push(e.data.buf);
        //current_chunk = e.data.buf;

        flac_return = Flac.decode_buffer_flac_as_pcm(flac_decoder)

        break;

    case 'buffer':
        if (WAVFILE) {
            wavBuffers.push(e.data.buf);
            wavLength += e.data.buf.length;
        }
        else {
            flacBuffers.push(e.data.buf)
            flacLength += e.data.buf.byteLength
        }
        break;

    case 'finish':

        var data;
        if (WAVFILE){

            data = exportMonoWAV(wavBuffers, wavLength);
            console.log("wav finish");

        } else {

            flac_ok &= Flac.FLAC__stream_encoder_finish(flac_encoder);
            console.log("flac finish: " + flac_ok);//DEBUG
            data = exportFlacFile(flacBuffers, flacLength, mergeBuffersUint8);

        }

        clear();

        self.postMessage({cmd: 'end', buf: data});
        INIT = false;
        break;
    }
};

function exportFlacFile(recBuffers, recLength){

    //convert buffers into one single buffer
    var samples = mergeBuffersUint8(recBuffers, recLength);

//  var audioBlob = new Blob([samples], { type: type });
    var the_blob = new Blob([samples]);
    return the_blob;

}

function exportMonoWAV(buffers, length){
    //buffers: array with
    //  buffers[0] = header information (with missing length information)
    //  buffers[1] = Float32Array object (audio data)
    //  ...
    //  buffers[n] = Float32Array object (audio data)

    var dataLength = length * 2;
    var buffer = new ArrayBuffer(44 + dataLength);
    var view = new DataView(buffer);

    //copy WAV header data into the array buffer
    var header = buffers[0];
    var len = header.length;
    for(var i=0; i < len; ++i){
        view.setUint8(i, header[i]);
    }

    //add file length in header
    view.setUint32(4, 32 + dataLength, true);
    //add data chunk length in header
    view.setUint32(40, dataLength, true);

    //write audio data
    floatTo16BitPCM(view, 44, buffers);

    return new Blob([view]);
}

function writeUTFBytes(view, offset, string){ 
    var lng = string.length;
    for (var i = 0; i < lng; ++i){
        view.setUint8(offset + i, string.charCodeAt(i));
    }
}

function mergeBuffersUint8(channelBuffer, recordingLength){
    var result = new Uint8Array(recordingLength);
    var offset = 0;
    var lng = channelBuffer.length;
    for (var i = 0; i < lng; i++){
        var buffer = channelBuffer[i];
        result.set(buffer, offset);
        offset += buffer.length;
    }
    return result;
}

function floatTo16BitPCM(output, offset, inputBuffers){

    var input, jsize = inputBuffers.length, isize, i, s;

    //first entry is header information (already used in exportMonoWAV),
    //  rest is Float32Array-entries -> ignore header entry
    for (var j = 1; j < jsize; ++j){
        input = inputBuffers[j];
        isize = input.length;
        for (i = 0; i < isize; ++i, offset+=2){
            s = Math.max(-1, Math.min(1, input[i]));
            output.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
        }
    }
}

/*
 * clear recording buffers
 */
function clear(){
    flacBuffers.splice(0, flacBuffers.length);
    flacLength = 0;
    wavBuffers.splice(0, wavBuffers.length);
    wavLength = 0;
}
russaa commented 7 years ago

I guess the problem is, that only 1 data-chunk/frame gets decoded?

I think you really need a while-loop in the 'decode' case (while minding that currently no buffered data may be available for decoding), or better yet, extract the decoding to a separate function:

self.onmessage = function(e) {
...
  case 'decode':

        console.log('case decode', e.data);

        // e.data.buf is the chunk we need to decode, and it must sit until FLAC calls the read callback, after which we must set the data into the supplied buffer and set the bytes pointer
        bufferedInputData.push(e.data.buf);
        //current_chunk = e.data.buf;

        doDecode();//<- extracted decode function

      break;
...

(see down below for the doDecode() code)

My guess would be that read_callback_fn is fine, except that there should be some code pausing the encoding, when no more buffered data is available (see the TODO comment/placeholder there).
I'm not sure, if pausing the decoding-process can be achieved by a simple boolean flag, but I would try that first, i.e. change

function read_callback_fn(bufferSize){
...
          currentDataOffset = 0;
        } else {
          //TODO pause decoding until (buffered) data is available again
        }
    }

    return {buffer: _buffer, readDataLength: numberOfReadBytes, error: false};
}

to

var decodingPaused = true;//<- add pause flag
function read_callback_fn(bufferSize){
...
          currentDataOffset = 0;
        } else {
          decodingPaused = true;//<- set pause flag if no more data is available
        }
    }

    return {buffer: _buffer, readDataLength: numberOfReadBytes, error: false};
}

... and then handle the decodingPaused flag in the doDecode function:

var dec_state = 0;//variable for storing current decoding state
function doDecode(){

  if(!decodingPaused){
    //decoding in progress -> do nothing
    return;
  }

  decodingPaused = false;

  //request to decode data chunks until end-of-stream is reached (or decoding is paused):
  while(!decodingPaused && dec_state <= 3 && flac_return != false){    
    flac_return &= Flac.decode_buffer_flac_as_pcm(flac_decoder);
    dec_state = Flac.FLAC__stream_decoder_get_state(flac_decoder);
  }
}
tyler-g commented 7 years ago

wow, Brilliant. I can't believe it's actually working. I have a lot of cleanup to do but will update the thread once I do that

tyler-g commented 7 years ago

Nevermind, I spoke too soon. I think something is off with this doDecode function.

If you return when !decodingPaused that means you'll never pass this while loop: while(!decodingPaused && dec_state <= 3 && flac_return != false){

russaa commented 7 years ago

hmm, I really do think it should work: decodingPaused is initialized with true, so at first invocation of doDecode it should pass the 'in-progress' check, then be set to false and thus be allowed to enter the while-loop.

... but you probably need to reset decodingPaused to true in the init event (as well as dec_state to 0)

Did you check with debug-outputs on the console, if the loop is reached?

tyler-g commented 7 years ago

I see what you mean. decodingPaused starts as true. On the first call of doDecode I verified it gets to the while loop. But never after that. I think something must be off in the read callback, because thats where decodingPaused gets set to true so that it can once again get to the while loop

russaa commented 7 years ago

maybe I did not fully understand, how the code should work: I assumed, that the decode event is continually triggered with new audio-input data -- is this the case?

Because in my code examples, the while-loop would get resumed (after it was paused in the read-callback) the next time the decode event is triggered (i.e. after new data was stored in bufferedInputData).

Is the audio-input data delivered this way, or is it done differently?

tyler-g commented 7 years ago

You're right the decode event is constantly triggered whenever new FLAC chunks are available for decoding.

what I'm noticing is that once decodingPaused is set to false, it never is set to true again, and I'm not sure why. It looks like in the read callback, its never getting to the decodingPaused = true line.

Here's what I have for the read callback:

function read_callback_fn(bufferSize){
    console.log('decode read callback, buffer bytes max=', bufferSize);

    //safety check for testing: avoid infinite loop by breaking at max. repeats
    if(++TEST_COUNT > TEST_MAX){
        return {buffer: null, readDataLength: 0, error: false};
    }

    if (!bufferedInputData.length) {
        return {buffer: null, readDataLength: 0, error: false};
    }

    var size = bufferedInputData[bufferedInputData.length - 1].buffer.byteLength;

    var start = currentDataOffset;
    var end = currentDataOffset === size? -1 : Math.min(currentDataOffset + bufferSize, size);

    var _buffer;
    var numberOfReadBytes;
    if(end !== -1){

        _buffer = bufferedInputData[bufferedInputData.length - 1].subarray(currentDataOffset, end);
        numberOfReadBytes = end - currentDataOffset;

        currentDataOffset = end;
    } else {
        numberOfReadBytes = 0;
    }

    if (numberOfReadBytes < bufferSize) {
        //use next buffered data-chunk for decoding:
        if (bufferedInputData.length > 0) {
          //get first entry from input-buffer (assuming new data-chunks are push()'ed)
          binData = bufferedInputData.shift();
          size = binData.buffer.byteLength;
          currentDataOffset = 0;
        } else {
          // TODO pause decoding until (buffered) data is available again
          console.log('setting decodePaused true');
          decodingPaused = true;//<- set pause flag if no more data is available
        }
    }

    return {buffer: _buffer, readDataLength: numberOfReadBytes, error: false};
}
russaa commented 7 years ago

that isn't necessarily a problem: the read-callback should set decodingPaused to true only if there isn't any more data that it could read (for decoding) ... if there always is enough data for decoding, then it would never set decodingPaused to true.

What exactly is the problem that you are experiencing: does the decoding stop at some point, although there is more audio data pushed in, that should get decoded?

If so: did you check the dec_state?
If it is 4, then the decoder thinks, the stream is already finished (i.e. end-of-stream was reached).
That would mean that the pausing did not really work, i.e. some other mechanism for pausing would be necessary.

tyler-g commented 7 years ago

The problem right now is that the read callback only gets fired twice, even though I confirmed chunk data is coming in continously.

I did log dec_state and noticed it is 4

russaa commented 7 years ago

OK, so I guess in the last invocation of the read-callback, numberOfReadBytes is 0?

That would mean, that the "should I pause" check in the read-callback is 1 invocation too late. I.e. we need to check ahead of time, that there is enough data, so that the next invocation of the read-callback will return some data (i.e. numberOfReadBytes > 0 in the next invocation). So there should be an additional check in if (numberOfReadBytes < bufferSize) (i.e. OR'ed), if the current data-chunk has some bytes left, or that there is a next data-chunk in bufferedInputData that has some data.

tyler-g commented 7 years ago

So I logged numberOfReadBytes , in the first invocation of read callback, it is 4. On the second invocation, it never reaches numberOfReadBytes because bufferedInputData is empty and thus returns out of the read callback . I'm not sure if I'm doing that part correctly. I added this because otherwise I was getting errors about readDataLength being undefined.

    if (!bufferedInputData.length) {
        return {buffer: null, readDataLength: 0, error: false};
    }
russaa commented 7 years ago

well, yes, as I said:
the read-callback needs to look ahead for setting the pause.

That means that the code of the if-clause if (numberOfReadBytes < bufferSize) { (and the if-clause itself, as mentioned before) needs to be modified so that pausing is set, if the next invocation would return no data.
From the debug output you mentioned, it seems that the second invocation already returns no data, so the pausing would need to be set in the first invocation.

Also: I noticed that you wrote the read-callback so, that it would always take the last entry of bufferedInputData ... since on the decode event the new data is pushed on the array the last entry contains the latest data, but the read-callback must use the oldest data (i.e. first entry in the array) and work its way up to the latest data.

Is bufferedInputData larger than 1 on the first invocation of the read-callback?
If so, the problems could be caused by this, since then the first read-callback invocation would read the wrong data-chunk which may not be, what the decoding function is expecting (usually there should be a stream header at the very beginning of the data).

tyler-g commented 7 years ago

thanks @russaa I see what you mean about the bufferedInputData array. I changed it to use bufferedInputData[0] which will be the oldest chunk (which then gets shifted away when the .shift() is called I believe).

I did confirm that bufferedInputData is larger than 1 on first invocation of read callback. It has 2 UInt8Array chunks in it at that point:

image

As far as your first point, how would I go about checking if the next invocation would return no data?

russaa commented 7 years ago

check if the current entry in bufferedInputData has data left to return, and if not, if there is another entry in bufferedInputData which has data (in that case you would also need to shift bufferedInputData already so that the next invocation accesses the next entry with the data).

But there is something strange going on, because the first invocation has 2 entries, and the second has none -- when the second invocation hits, there should be at least 1 entry left.
Is there some other place (besides the check in in the read-callback) where entries in bufferedInputData get removed?

russaa commented 7 years ago

... if you make a running example available I would take look

tyler-g commented 7 years ago

thanks yes it's probably easier if I give you a live example to look at. It's a bit complex at the moment so I'm going to branch off and see if I can create a simple working demo

tyler-g commented 7 years ago

Ok I stripped it down to a simple working example and put it here: http://tyler-g.com/flac-test/

As soon as the page loads, an encoder and decoder worker are initialized.

I noticed that as soon as the encoder inits, it hits the encoder's write callback a few times with some data. I'm guessing this is the "header" part of the FLAC data. This then gets sent to the decoder (or should it not?) but as before is hitting the dec_state=4

If you hit the stream mic button on the page, it will start feeding audio data from your mic and send it to the encoder. The encoder then returns it as FLAC data and passes it to the decoder. None of these chunks seem to be making it to the decoder read callback, because its stopping earlier when it tries to decode those first few chunks from the encoder init.

Note if you're using chrome it might complain about not being able to stream the mic on unsecure. It works on firefox, or you could just download the files and run locally.

Really appreciate your help on this!