turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

fixed seed doesn't work on ooba's webui #201

Closed BadisG closed 1 year ago

BadisG commented 1 year ago

Hello,

I don't know if I should post this in there, but the fixed seed doesn't work on exllama or exllama_hf loader (it works fine on auto_gptq though). Is it a problem related to ooba's repo or yours? https://github.com/oobabooga/text-generation-webui/issues/3262

turboderp commented 1 year ago

ExLlama isn't completely deterministic because it relies on FP16 math with atomic addition rather than reduction.

BadisG commented 1 year ago

That's a shame, I don't know how can I see the effects of cfg if the outputs keep changing for a fixed seed: https://github.com/oobabooga/text-generation-webui/pull/3325#issuecomment-1653487632

turboderp commented 1 year ago

Closing this here since determinism isn't in the cards at the moment. Perhaps somewhere down the line or in V2, but it's too large a rewrite and it's unclear what the benefits would be for the performance cost.