Closed omkar-12bits closed 4 months ago
after trying this on both mistral and mixtral its sure that paddings doesn't work very well with these models. i was just playing with prompts at inference time and saw that paddings makes generation worst. if inference on batches doesn't work well then how does it performs while training ? shouldn't it also produce garbage ?
can you post the code you use please?
i tried using
eos_token
,unk_token
andbos_token
withleft
andright
padding side but whenever the padding tokens size increases the outputs are pure garbage.