jefflai108 / Attentive-Filtering-Network

University of Edinbrugh-Johns Hopkins University's system for ASVspoof 2017 Version 2.0 dataset.
MIT License
49 stars 22 forks source link

How to generate .scp files? #2

Closed mgchbot closed 5 years ago

mgchbot commented 5 years ago

I can't find any information about .scp files. Can you provide more details about it? Or how to prepare data to train this model? Thank you in advance.

jefflai108 commented 5 years ago

The features (i.e. log-spectrogram) for the network is stored in scp and ark files, extracted from the Kaldi speech toolkit. The ark files are where the actual features are stored and scp files are for accessing the information stored in the .ark file. For more detailed information, you can check out this webpage: http://kaldi-asr.org/doc/io.html

The feature processing procedure is: raw waveform --> Kaldi extract log-spectrogram and stored in scp and ark files --> post process features (Attentive-Filtering-Network/src/data_reader/feat_slicing.py)

mgchbot commented 5 years ago

The features (i.e. log-spectrogram) for the network is stored in scp and ark files, extracted from the Kaldi speech toolkit. The ark files are where the actual features are stored and scp files are for accessing the information stored in the .ark file. For more detailed information, you can check out this webpage: http://kaldi-asr.org/doc/io.html

The feature processing procedure is: raw waveform --> Kaldi extract log-spectrogram and stored in scp and ark files --> post process features (Attentive-Filtering-Network/src/data_reader/feat_slicing.py)

Hello, I followed your steps and I can run your code now. By using your AttenResNet4 with 257*1091 feature, it only gives me around 0.10 eer on dev set and 0.28 eer on eval set (both in ASV2017 v2 dataset). Could you give me some advice to improve it? Thank you!

entn-at commented 5 years ago

Hi, I also ran AttenResNet4 with 257*1091 (no VAD) (log-)spectrogram features. It trained for 8 epochs out of 30 (because it hit max_patience). Like @mgchbot, the best achieved EER 11.37 / Avg. loss 1.0024 on the dev set, but EER 28.31 on the eval set. Do I need to adjust other hyperparameters (like selecting best model based on validation avg loss instead of EER or max_patience, etc.) to achieve results comparable to those in the paper? Thanks!

jefflai108 commented 5 years ago

Hi @entn-at @mgchbot,

Did you apply Mean Normalization to the log-spectrogram? It was a while ago but i remembered normalization yields very different results (on this dataset). Can you share more details on how you prepare the feature map?

mgchbot commented 5 years ago

Hi @entn-at @mgchbot,

Did you apply Mean Normalization to the log-spectrogram? It was a while ago but i remembered normalization yields very different results (on this dataset). Can you share more details on how you prepare the feature map?

Hi, Thank you for your reply. I used kaldi toolkit to generate log-spectrogram(https://github.com/kaldi-asr/kaldi/blob/master/src/feat/feature-spectrogram.cc), then use feat_slicing.py to slice features. Can you provide some detail about the Mean Normalization?

jefflai108 commented 5 years ago

In Kaldi, the function apply-cmvn-sliding applies a normalization sliding window to any given feature. Here is an example and you can find out more in Kaldi's example scripts. Given the original log-spectrogram feature scp file (your_path_to_feat/feats.scp): feats="ark:apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:your_path_to_feat/feats.scp ark:- |" copy-feats "$feats" ark,scp:your_path_to_feat/feats_cm.ark, your_path_to_feat/feats_cm.scp

your_path_to_feat/feats_cm.scp is the normalized log-spectrogram feature map file.

Afterward, apply the feat_slicing.py on this normalized feature file.

entn-at commented 5 years ago

Thank you so much for your detailed response! I apologize, I completely overlooked that in the description in the paper. I re-read the relevant section and it clearly states "[...] and applied mean normalization using a 3-second sliding window." Training is still running at the moment, but for the first epoch, dev EER already went down to 9.7%, so applying sliding mean normalization seems to have taken care of the problem. Thanks again! EDIT: Eval EER was 14.4513%, which is a lot better!

jefflai108 commented 5 years ago

Hi @entn-at, From my experience, for small dataset like this, results vary a lot for each run. Try to train the neural networks for a couple times and select the best one, and you can get around ~10% eval EER. What I did (described in the experiment section of the paper) is to run the same network 8 times and do average.

entn-at commented 5 years ago

Thanks, @jefflai108! When I ran it again overnight, eval EER ended up at 8.7326%. I guess the small size of the development set (1710 samples) makes it hard to assess how well a model generalizes. Again, thanks for your help and for releasing your implementation!

immonomono commented 5 years ago

Hello,

I am not familiar with kaldi. In fact, this is not my major, I have never used it before. I tried to follow some examples, but I could not find a way. Can someone tell me how to generate scp from ASVspoof 2017 data?

jefflai108 commented 5 years ago

Hi @immonomono,

I use Kaldi for feature extraction. The features are stored in the "ark" files and "scp" files is like a hash table that points to the memory position of every utterance within the "ark" files. You do not have to use "ark" or "scp" files to prepare the features for training the NN, for example you can use hdf5 for storing and accessing features. However, you will likely need Kaldi for generating/extracting features, as it has been the standard in the speech community.

For your reference, this is the introduction to "ark" and "scp" in Kaldi: http://kaldi-asr.org/doc/data_prep.html I found it may be the easiest to learn from running one of the recipe, such as this one: https://github.com/kaldi-asr/kaldi/tree/master/egs/sre08/v1

immonomono commented 5 years ago

Hi @jefflai108

Thank you for your kind advice. I will try to follow your guide.