Cannot reproduce Llama2 results

taratt commented 6 months ago

Hello, I'm opening this issue because I'm still having problems with reproducing the llama 2-7b results (both without pruning and using wanda). Here are my intermediate and final perplexity results with the dense model (with context size 4096). It seems like the last few samples are somehow messing up the perplexity but I don't know why. Any help would be appreciated. nsamples 333 sample 50, Perplexity 5.0264153480529785 sample 100, Perplexity 5.311441421508789 sample 150, Perplexity 5.710564136505127 sample 200, Perplexity 5.612466335296631 sample 250, Perplexity 5.526543617248535 sample 300, Perplexity 6.8109965324401855 wikitext perplexity 7.72459077835083

Eric-mingjie commented 6 months ago

I recalled that there shouldn't be 333 samples for the wikitext, actually much less than that (in my case it is 83). Are you using the validation set?

taratt commented 6 months ago

I am using the same testenc that the function get_wikitext2 in data.py is returning. If the model's sequence length is 4096, does this mean that I'm somehow getting more samples?

Eric-mingjie commented 6 months ago

Correct, 333 does not looks like the right number from what i am seeing on my end; and i was referring to the test split, sorry for the confusion.

taratt commented 6 months ago

Thanks to your tip I was able to figure out what the problem was. I was testing over wikitext103 instead of wikitext2. The version of datasets suggested in your install file automatically loads wikitext103 instead of wikitext2. I suggest you update it. Thanks again.

Eric-mingjie commented 6 months ago

Great, thank you for the update.

locuslab / wanda

Cannot reproduce Llama2 results #52