Open tyler-g opened 8 years ago
yes, I think the main problem was the meta-data:
basically, decoding should not start until there is some actual audio data available.
There were some other, minor problems too (e.g. the "pause handling" in the read-callback).
I made a gist, based on your example (first revision are the original files, the 2nd the changed ones):
https://gist.github.com/russaa/e553077e4ecd9fbdb8894572ca190269
Note, that the current decoding-example returns raw data when finish
ing.
I include example code for exporting WAV data (see finish
handling): it uses functions from data-util.js
, i.e. that script would need to be imported in decode.js
for it to work.
Also, the HTML file does currently not load data-util.js
, so the data will not be downloaded.
@russaa I really owe you one for this. It is great to finally see it working after struggling with it for a long time. I didn't realize that the initial chunks from initing encoding would be messing up the decoder, but it does make sense when I think about it.
I added the data-util.js
and the downloading is working. The only issue I'm seeing now is that the exported decoded WAV data has these interittent "pops" in the audio. Looks like this in waveform:
vs the exported FLAC file:
Could that be related to the "hacky" thing you mentioned you did in libflac.js
earlier in this thread?
Could that be related to the "hacky" thing you mentioned you did in libflac.js earlier in this thread?
well, indirectly: the "hacky" thing was to remove (wrongfully) duplicated binary data values within the decoded data -- that works pretty well (in my tests I could not find any instance where it did something wrong; it's "hacky" in the sense that it's necessary at all).
But I also noticed some non-duplicated data that is somehow modified (i.e. not what it should be), although I could not find a clear pattern to the modification, so that it could be reversed/corrected (as with the duplicated data).
These modifications are only comparitively small parts in the decoded data -- I did not look at its waveform, but it may very well amount to the "poping" sound that you did notice.
I guess, both (the duplication & the data modification) may have the same cause, but I did not find it yet -- I will have look at it again, maybe at the end of the week or next week.
Great thanks for the update. I'll check in as well if I figure anything out
... your instinct was right:
the clicking sound was caused by the HACK -- because it evaluated data outside of the boundaries of the channel's audio data, it created artifacts at the end of a (channel's) writing block.
It is not fully fixed yet (see commit 27014e94c05d9bf96bef3f5e87170df3d4a20f6e), but it's more or less invisible/unhearable now.
That's great - I can confirm its working for me as well now after the above commit. The last issue i'm seeing now is a "pop" right near when the recording first starts. But thats in the encoded flac file as well (meaning its not an issue with decoding). I must be passing something to the encoder that I shouldn't be. I'll look into it. Thank you for your help. Couldn't have figured this out without it.
Hey @tyler-g, just a heads up:
I think I finally figured out, how the decoded data has to be fixed -- the decoded files I used for testing were exactly the same as the original WAV input files.
I will upload the changes next week, and also update the compiled library file (changing from FLAC v1.3.0 to v1.3.2) -- probably with a new asynchronously loading minified version.
@tyler-g, I would like to add a "streaming decode example" -- would be OK to base it on your encode&decode sample code?
btw. I pushed a new version for libflac.js:
some functions are renamed, and the signature for init_libflac_decoder
is changed (there is only one optional parameter now)
-> see comments for v0.3.0
@russaa that sounds great, and yes feel free to use the example
Hey @russaa I was trying to do some performance testing of the encoding process. I could have sworn I had done it in the past and found the encoding latency to be ~ 5ms.
However now I'm looking at it and getting ~100ms, I must be doing something.
Each time before Flac.encode_buffer_pcm_as_flac
is called (encode a chunk), I push the time into a queue (array):
timeList.push(performance.now())
Then in the encoder's write callback, write_callback_fn
I calculate:
var timeToChunkCb = performance.now() - timeList.shift()
Does anything seem off to you ?
hmm, I don't know:
did you have the developer tools open and/or debug output (on the console) that could slow it down?
If not:
do you remember with which version you tested before?
(the min.js
version should have more optimizations present than the normal/development version)
The newer versions are compiled differently (using the new compiler in emscripten), so there could be a difference due to that.
Did you use the same browser as before?
The new version should be better optimized for Firefox (asm.js).
But to be honest, speed wasn't really a big concern for me in the past, so I did not really benchmark the different versions. But if you are interested, we could look into it. I am away next week, but after that ...
You're right, I have noticed if console is open it can dramatically slow down execution time if there is lots of logging.
I'm using chrome and have the console open. I haven't used your 1.3.2 update yet (couldn't get it to integrate yet). I'm still on the 1.3.0 or whatever it was before. I haven't been able to get the minified version to work. I'm using webpack and vue.js for my project and something about the compiling isn't playing nice with the minified file. I'll try to dig in more..
w.r.t. the minified version 1.3.2:
there are some comments on using the minified version in the README:
basically, the minified version loads the .mem
file asynchronously, and no (native) library function can be used until that file was loaded -- for an example see speech-to-flac which has been updated to use the minified version.
Problems with transpilers/packaging may be due to the fact that the .mem
file is a binary file; and it should not be modified any further.
... and if you have some code for benchmarking it would be great, if you could share it.
Hey, yes I have some.. just want to get it to a readable format to share with this thread. I think it may be useful. I did find that if you manually set the blocksize of the encoder, the encoding delay drops significantly. I couldn't get emscripten make to work right, so I just added this function manually to the js lib, and it seems to work.
Also the verify option adds a significant bit of latency (as its decoding each chunk it encodes for verification).
Will post all my results probably early next week, as this week might be quite busy for me.
sounds interesting
I would be very interested in the results, but take your time -- starting next week I will be away again, so I won't be able to look at it the next few weeks in any case.
The blocksize-setter should probably be added to the exported interface then (and maybe as an additional/optional parameter in the init_libflac_encoder
function)
... for adding the function temporarily, did you try something like
//usage: Flac.FLAC__stream_encoder_set_blocksize(encoder: NUMBER, block_size: NUMBER)
Flac.FLAC__stream_encoder_set_blocksize = Flac._module.cwrap('FLAC__stream_encoder_set_blocksize', 'number', [ 'number', 'number']);
@russaa catching up after a few more years! How are you?
I'm revisiting my project again and was wondering if anything major had changed in your documentation in the README. Does that include flac decoding on a stream (unknown size)? I can't remember if my libflac file had some custom work we did on it or not.
I'm having an issue now where decoded flac audio seems to be lower pitched. I have no idea why but I was hoping to start fresh using the latest you have in this repo. I'm just however worried there might be some custom work we did that was never ported over..
Hi @tyler-g, nice to hear from you again!
As far as I can remember, there were no custom changes left, but I am not 100% sure about that.
The latest version has some small changes w.r.t. to the interface, but mainly is the same.
E.g. one notable change is renameing Flac.init_libflac_decoder -> Flac.create_libflac_decoder
(and it is "use strict"
compatible now).
The library is now compiled with a newer version of emscripten
which required some internal changes, but the API did not change due to that.
In addition, there is now a wasm
(-> binary Web Assambly) version of the library available, e.g. see demo pages "Library Variants" and the corresponding section in the README.
For decoding a data stream (i.e. "chunks of data") see snippet commented withVARIANT 1
for step [2] DECODE
in the README's decoding exmaple.
There is also some code in the decoding example for that: mainly, the read callback needs to be able to return data chunks when requested & report the correct, returned data size, as well as the end-of-data (by returning readDataLength
with value 0
).
fyi:
there is a short description for for integrating the min
or wasm
version with webpack
-- the instructions refer to integrating libflac.js
in a WebWorker, but in in principle can also be used as a guide to to integrate it in the main-thread, see mmig/speech-to-flac/issues/4#issuecomment-543855202
Oh that would great to integrate with webpack – I will try to get that working.
The thing is, using the dist library libflac4-1.3.2.js
, I compared it to the old one I had been using in my project, and noticed the following FLAC functions were missing:
decode_metadata_flac
decode_stream_flac_as_pcm
decode_buffer_flac_as_pcm
encode_buffer_pcm_as_flac
init_decoder_stream
init_encoder_stream
FLAC__stream_encoder_init_file
To get stream encoding/decoding to work I had to include those functions, and also export them in the library (too hacky!)
Are you seeing the same thing?
In short, after I add those the streaming encoding/decoding is working – with one caveat that the pitch of the decoded audio data is slightly lower - I have no idea why :\ I'm going to dig in more
... then it's been longer than I thought: most of these have been renamed to reflect the original function names in the FLAC sources
e.g. see f08fbc089f1c745500bcc0a720aba33006b6aa24
init_libflac_decoder -> create_libflac_decoder
init_libflac_encoder -> create_libflac_encoder
and e872f056ec00c5fc885f93fab151bdbf624b2b94
* API changes: renamed functions to reflect original Flac C function
names
* decode_buffer_flac_as_pcm -> FLAC__stream_decoder_process_single
* decode_stream_flac_as_pcm -> FLAC__stream_decoder_process_until_end_of_stream
* decode_metadata_flac -> FLAC__stream_decoder_process_until_end_of_metadata
* encode_buffer_pcm_as_flac -> FLAC__stream_encoder_process_interleaved
* renamed library files: included library version number (in addition to
Flac version number)
... about the pitch: maybe the sample rate parameter is set to a wrong value ... I am currently not sure if it would result in a different pitch or just cause an error
@russaa thanks - I didn't know about the name changes :) I updated everything in my project so it works with them now. As far as the pitch issue, maybe it's just in my head! I'll have to verify
I also noticed some discrepancy between what you have in the README for decoding and what I have.
Here are my worker files for encoding and decoding: encoder: https://gist.github.com/tyler-g/85271975eada5014f765d24e2f79c27a decoder: https://gist.github.com/tyler-g/e3aa39e97c9195945ecb89b1dc845831
Pitch issue is still there and I'm trying to figure out if its related to how I'm exporting the WAV. I double checked sample rate is all correct.
I'm trying to also export the FLAC data to compare it to the wav, but exporting flac file isn't working because it's not adding the flac metadata and thus isn't a valid flac file. I noticed the metadata function metadata_callback_fn
is never called and thus metadata is never set.
I can't find anything clearly wrong yet
thank you for the hint about the outdated README -- I will check it next week.
I checked the metadata_callback_fn
invocation in the decoder-example, and there it seems to work as it should -- I'll have a closer look next week.
As for the pitch: did you verify that the source audio has the same or different pitch? Maybe the source audio already has a "strange" pitch(?)
One of the tests for encoding & decoding functionality was, to compare a round-trip, i.e. encode from WAV to FLAC, and the decode that FLAC to WAV again.
And while I found some differences when enconding with the wasm
variants, the resulting WAV files were always exactly the same as the original WAV files (even if the intemediate FLAC files of the wasm
libraries had some binary differences, see the note in the library variants) description.
About the pitch - I'm definitely sure it's lower pitch because my voice sounds like I've been smoking for years :)
Again this is only testing it through streaming input (streaming mic input, encoding the stream, sending chunk over p2p connection, decoding stream on the other end)
The metadata callback is not firing for me - again I'm only testing streaming though, not encoding/decoding of an existing audio file
I think the pitch problem most probably stems from a wrong sampling rate during encoding:
do you use the sampleRate
from the AudioContext
or a hard-coded value?
E.g. if the real sampling rate (i.e. audioContext.sampleRate
) is 48 kHz, but you use a hard-coded value of 44.1 kHz, the recorded voice would have a lower pitch.
my hunch too - though it appeared the sample rates were correct – let me check again and confirm
... the missing call to metadata_callback_fn
is really strange:
I checked the example for all-at-once decoding, as well as for single-chunk decoding (i.e. "streaming"), and it both works as expected (the meta-data callback should get triggered, when the finish-function for the decoder is invoked after the meta-data block/header is read).
Did you verify, that actual data gets decoded? I.e. the resulting WAV file contains actual audio data?
Generally, the FLAC data should contain basic meta-data, even if not explicitly set after encoding to FLAC (i.e. gets automatically set during initialization), e.g. something like this
bitsPerSample: 16
channels: 2
max_blocksize: 4096
min_blocksize: 4096
sampleRate: 44100
(i.e. total_samples
, md5sum
and framesize information would be missing without explicitly adding it, after encoding to FLAC)
I have tested the example code (also the speech-to-flac demo, i.e. encoding a stream), and the resulting FLAC data/files do contain these mentioned basic meta-data, even without setting any meta-data explicitly.
Ok, doing a little investigation. I'm still not getting metadata_callback_fn
firing, but in the write_callback_fn(buffer, blockMetadata)
callback, I'm seeing blockMetadata
log as:
{
bitsPerSample: 16
blocksize: 4096
channels: 1
crc: 0
number: 0
sampleRate: 44100
}
(which is correct). I see that on every write callback on the decode side.
On the encode side I can confirm the encoder is initialing with that same samplerate, channels, and bps.
If you have time to take a look, maybe I can invite you to my project? It may be easier to see my setup that way. I really appreciate all your help thus far.
sure, I'll take a look, if you give me access
Do you have a gitlab username, or email I can use to invite you? The project is on gitlab
I have created an account on gitlab.com with user name russa
Thanks I just added you to the project - the setup steps are in the README
I had a look at the project:
1) the pitch problem was caused due to the fact that you tried to change the sample rate using an audio buffer, but it needs to be set in AudioContext constructor
-> see branch fix_pitch
2) the metadata problem was caused by an initialization problem:
note that I did not really find out when the connection is fully established, so I just added a long delay before sending the cached audio data via setTimeout()
; you probably have a better idea when exactly the connection to the decoder thread is fully established and the cached data can be sent
-> see branch fix_metadata
Also: after stopping the streaming, i.e. finished encoding/decoding, the encoder & decoder need to re-initialized (there is no code included for this, but it should be fairly simple)
Wow, so much thanks for finding that – I did not know about needing to pass the sample rate directly into the constructor like that.
I am going to clean up my code a bit and also figure out the reinitialization bit – so that I don't need to refresh the page to start again.
As far as #2, the meta data callback, do you think I should simply wait until the connection is open to initialize the FLAC encoder/decoder workers? That way the meta headers can be sent after connection is already open. It may be a little slower that way. However I do like your way of caching the headers as well – I'll need to play around with it a little and see what works best in this case.
So happy to see it finally working !
w.r.t. to re-initialization:
it's basically the same as what is done for the init
event, but without recreation of the encoder/decoder, i.e. no invocation of create_libflac_encoder()
/ ..._decoder()
_(or if you recreate those, make sure to destroy the ones your are not using anymore, i.e. call FLAC__stream_encoder_delete(encoder)
and FLAC__stream_decoder_delete(decoder)
)_
@russaa thank you for this great emscripten port.
I've been working on a fork to add chunk decoding ability, and running into some issues. I was able to get
FLAC__stream_decoder_init_stream
to successfully init (return code 0), but the next step is where I'm struggling.On the encoding side, when you call
FLAC__stream_encoder_process_interleaved
you pass it the data to encode , but the decoding side is a bit different: libflac uses this function to process a decode buffer:FLAC__stream_decoder_process_single
Except you don't pass it data directly, you instead pass it a pointer to a read callback. Once that read callback is called, one of the callback parameters is
buffer
(a pointer to the location where the data will go) andbyte
( a pointer to the max size of bytes that can be written to memory for that chunk. libFlac docs say this pointer must be updated to the actual number of bytes the chunk is (since in flac the chunk size varies).Basically what I've done is create a new worker for decoding and added decode function references to the libflac.js. The stream decode init is working properly, but I cannot get the
FLAC__stream_decoder_process_single
to work properly, and I'm almost sure it's the way I'm handling the pointers returning in the read callback function.I've also described the issue a bit in this stackoverflow : http://stackoverflow.com/questions/40311276/decoding-a-single-chunk-of-flac-data-in-javascript-using-libflac-js
Would be great to hear your thoughts on this, thanks!