A problem about the PPL value after sliced model fine-tuning

microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

MIT License

360 stars 35 forks source link

A problem about the PPL value after sliced model fine-tuning #141

Closed qxpBlog closed 4 months ago

qxpBlog commented 5 months ago

@nailimixaM @myshkov @mtodd @tpope @sverrejoh @radical Why does the PPL value increase after fine-tuning the sliced model:

liuxiaozhu01 commented 4 months ago

The same thing happened to me. In my settings, the calibration dataset is different from fintuning dataset. The PPL value increased after finetuning the sliced model but zero-shot performance get improved.

nailimixaM commented 4 months ago

As @liuxiaozhu01 says, depends on your finetuning dataset. If you slice and finetune on non-test-splits of WikiText2 and evaluate the PPL on the test split you should see the PPL decrease.