Closed qxpBlog closed 4 months ago
The same thing happened to me. In my settings, the calibration dataset is different from fintuning dataset. The PPL value increased after finetuning the sliced model but zero-shot performance get improved.
As @liuxiaozhu01 says, depends on your finetuning dataset. If you slice and finetune on non-test-splits of WikiText2 and evaluate the PPL on the test split you should see the PPL decrease.
@nailimixaM @myshkov @mtodd @tpope @sverrejoh @radical Why does the PPL value increase after fine-tuning the sliced model: