linhdvu14 / vggvox-speaker-identification

Speaker identification with VGGVox network
82 stars 34 forks source link

about MFCC #5

Open TTTJJJWWW opened 5 years ago

TTTJJJWWW commented 5 years ago

@linhdvu14 Hi, thanks for your code. I know you are using the model with weight from VGGVOX, but where is the MFCC process? Or you use different features?

linhdvu14 commented 5 years ago

Hi, VGGVox doesn't use MFCC, only FFT spectrum. The signal processing code is in sigproc.py.

TTTJJJWWW commented 5 years ago

@linhdvu14 Hi,Thank you for your reply. I am doubtful about "VGGVox doesn't use MFCC", because the source code of VGGVOX contain the MFCC function(from MFCC folder) and use it : function [ SPEC ] = mfccspec( speech, fs, Tw, Ts, alpha, window, R, M, N, L ) % MFCC Mel frequency cepstral coefficient feature extraction. ...

linhdvu14 commented 5 years ago

Yes but if you look at the code of mfccspec, the return value SPEC is only FFT.

TTTJJJWWW commented 5 years ago

Oh I see. So you mean that the features of wav are inputed in model as a image (grey-scale image)? And the system essentially calculates the similarity (distance) of the image?

TTTJJJWWW commented 5 years ago

@linhdvu14 Hi, did the "weights.h5" store both the architecture and weights, or just weights? I want to convert to a TensorFlow model(.pd). Can I just use "keras_to_tensorflow" tools to do it? Look forward to your reply.

linhdvu14 commented 5 years ago

It's just weights. You'd probably want to export both weights and architecture before trying keras_to_tensorflow. Or replicate the model architecture in tf and restore weights from a dict.