Open ChangXiangshi opened 2 years ago
Note: Please use the latest k2, i.e., v1.11, to compute framewise alignments as it fixes a bug in https://github.com/k2-fsa/k2/pull/877
OK, i'll try
@csukuangfj It seems to support bpe or char unit alignment only, how to get different levels of alignments? I just generate transition-id alignment in kaldi, which can map to pdf-id, phone, and word.
There are no transition id or pdf id here, which exist only in HMM/GMM based systems.
If your modelling units are wordpieces, you can get alignment for BPE tokens and words.
(Note: To get word alignment, you need to do some extra work. Since a BPE token starting with a _
indicates the beginning of a word, you can get word alignment information from BPE tokens)
If your modelling units are characters, you can get alignment for characters and words. (To get word alignment, you have to assign some attribute to the alignment graph to indicate which arc is the beginning of a word.)
If your modelling units are phones, you can get alignment for phones and words, similar to the one using characters as modelling units.
Hi, just a question, even if I'm not sure I'm asking at the right place. it seems to me you're getting your best WER results (at least on librispeech) with word-piece models rather than phone-based or char-based (which is basically the graphemic phonetisation, right? i.e. the modelling unit is a word letter) but if your decoding lexicon contains OOV words with respect to the training transcripts on which a BPE model was trained, you might decompose those OOV words in word pieces that are not modeled by the BPE model, which is basically never the case with char or phone-based models is the bpe model something recommended with large training corpora and vocabulary sizes? (I was wondering because I' have never trained word-piece models before)
you might decompose those OOV words in word pieces that are not modeled by the BPE model
The lexicon built from the BPE model maps all OOV words to the token, i.e., word piece, <unk>
.
All pieces of a BPE model, including <unk>
, are used during training, I think.
is the bpe model something recommended with large training corpora and vocabulary sizes?
Sorry, I cannot say too much about this. As far as I know, ESPnet and SpeechBrain are all using wordpieces as modeling units. You don't need to be an expert to build a lexicon. It can be trained from data when using BPE models.
There are no transition id or pdf id here, which exist only in HMM/GMM based systems.
If your modelling units are wordpieces, you can get alignment for BPE tokens and words. (Note: To get word alignment, you need to do some extra work. Since a BPE token starting with a
_
indicates the beginning of a word, you can get word alignment information from BPE tokens)
I can see that, besides that, it is also necessary to eliminate duplicate tokens that are contiguous in time (i.e. no blank in between, like when we do CTC loss computation) I noted this while using the rescoring with the transformer decoder
I can see that, besides that, it is also necessary to eliminate duplicate tokens that are contiguous in time (i.e. no blank in between, like when we do CTC loss computation) I noted this while using the rescoring with the transformer decoder
Do you mean there are no blanks between repeated symbols in the following tokens
?
when I do get_alignments(best_path) from the best_path (for each attention and lm scale) of the transformer decoder
sometimes I have repeated tokens with no blanks and to retrieve the correct word I have to count the token only once
How do you invoke get_alignments
?
i don't know how to generate train graph for aligning data, are there examples?
How do you invoke
get_alignments
?
just like it is shown in conformer_ctc/ali.py ali_ids = get_alignments(best_path)
in decode_one_batch
if best_path_dict is not None:
for lm_scale_str, best_path in best_path_dict.items():
hyps = get_texts(best_path)
just like it is shown in conformer_ctc/ali.py
get_alignments
requires two arguments. How do you call it?
There is no get_texts()
in conformer_ctc/ali.py
just like it is shown in conformer_ctc/ali.py
get_alignments
requires two arguments. How do you call it?There is no
get_texts()
inconformer_ctc/ali.py
I should have been more specific get_texts and decode_one_batch are in conformer_ctc/decode.py the part I quoted come from the point in the function where rescore_with_attention_decoder is called; it's the point where I called get_alignments to then extract the tokens (and word) alignments for best_path (for each attention scale and lm_scale in the list)
but you're right, I've not updated icefall in the last couple of month (k2 and lhotse, yes, but not icefall); before get_alignments had only best_path as argument I'll download the latest version, sorry
i don't know how to generate train graph for aligning data, are there examples?