How to get the likelihood from the model?

microsoft / Pengi

An Audio Language model for Audio Tasks

MIT License

281 stars 15 forks source link

Hi authors,

I'm trying to get the likelihood of Pengi on (audio, question, answer) tuples, but wasn't able to do so. Is it possible to have get some help on this?

I think probably this forward function calculated the loss: https://github.com/microsoft/Pengi/blob/main/models/pengi.py#L174 where audio should be the output of preprocess_audio, texts_enc should be the output of running preprocess_text on question, texts_dec should be the output of running preprocess_text on answer. However I wasn't able to loss from the output, even if I pass label = texts_dec['input_ids'] (https://github.com/microsoft/Pengi/blob/main/models/decoder.py#L219) , I still get bugs on dimension when calculating cross_entropy loss

Your help is greatly appreciated.

Best, Puyuan

Hi @jasonppy, the loss computation during training looks like:

outputs = model(audios, texts_enc, texts_dec) where model is the PENGI model, audios is float32 tensor, texts_enc and texts_dec are tokenized text input and text output.
logits = outputs.logits[:, total_prefix_length - 1: -1] Remove the outputs corresponding to the total prefix length. This is equal to length of audio projection and length of input text
loss = F.cross_entropy(logits.reshape(-1, logits.shape[-1]), texts_dec['input_ids'].flatten(), ignore_index=0) Compute cross entropy per token and average. Swap 0 with whichever token index is used for padding.

For texts_dec in step 1, make sure to add/prepend ones equal to the total prefix length to the attention mask of tokenized text.

microsoft / Pengi

How to get the likelihood from the model? #15