Closed IMbackK closed 1 week ago
% python test_inference.py -m CodeLlama-13B-GPTQ/ -p "int main(" -nfa -l 2048 -lm Finding flash_attn NO flash_attn module -- Model: CodeLlama-13B-GPTQ/ -- Options: ['length: 2048', 'no_flash_attn', 'low_mem'] Traceback (most recent call last): File "/home/philipp/machine-lerning/exllamav2/test_inference.py", line 95, in <module> model, tokenizer = model_init.init(args, ^^^^^^^^^^^^^^^^^^^^^ File "/home/philipp/machine-lerning/exllamav2/exllamav2/model_init.py", line 101, in init if args.low_mem: config.set_low_mem() ^^^^^^^^^^^^^^^^^^^^ File "/home/philipp/machine-lerning/exllamav2/exllamav2/config.py", line 143, in set_low_mem self.max_output_len = min(self.max_output_len, 1024) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: '<' not supported between instances of 'int' and 'NoneType'
Thanks. Fixed in dev branch now.