lxe / simple-llm-finetuner

Simple UI for LLM Model Finetuning
MIT License
2.05k stars 132 forks source link

Inference doesn't work after training #10

Closed vadi2 closed 1 year ago

vadi2 commented 1 year ago

I trained my input text on a rtx 4080 (16gb vram) with the default settings:

image

And that seems to work OK:

TrainOutput(global_step=116, training_loss=1.0854247685136467, metrics={'train_runtime': 258.9812, 'train_samples_per_second': 0.448, 'train_steps_per_second': 0.448, 'train_loss': 1.0854247685136467, 'epoch': 1.0})

However inferencing doesn't work and I don't have enough context to understand why yet:

  File "/home/vadi/Programs/simple-llama-finetuner/main.py", line 27, in maybe_load_models
    model = LlamaForCausalLM.from_pretrained(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2588, in from_pretrained
    raise ValueError(
ValueError: 
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.

Currently 12.5 / 16gb vram is being used, if that matters.

vadi2 commented 1 year ago

Reloading the Python process seems to have done the trick.

vadi2 commented 1 year ago

This is still an issue for me using 97d5aae6e486f1e68e151f21ce8f54be303356c9. I train my model and go to inference - it doesn't work:

To create a public link, set `share=True` in `launch()`.
Loading base model...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Number of samples: 539
Training...
/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
{'loss': 1.2125, 'learning_rate': 0.00018888888888888888, 'epoch': 0.37}
{'loss': 1.0923, 'learning_rate': 7.777777777777777e-05, 'epoch': 0.74}
{'train_runtime': 209.2845, 'train_samples_per_second': 2.575, 'train_steps_per_second': 0.258, 'train_loss': 1.1094184628239385, 'epoch': 1.0}
Loading base model...
Loading peft model lora-apple-fig...
Loading tokenizer...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/gradio/helpers.py", line 587, in tracked_fn
    response = fn(*args)
  File "/home/vadi/Programs/simple-llama-finetuner/main.py", line 104, in generate_text
    output = model.generate(  # type: ignore
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/peft/peft_model.py", line 581, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1451, in generate
    return self.sample(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2467, in sample
    outputs = self(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/peft/tuners/lora.py", line 522, in forward
    result = super().forward(x)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 317, in forward
    state.CxB, state.SB = F.transform(state.CB, to_order=formatB)
  File "/home/vadi/Programs/miniconda3/envs/finetuner2/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1698, in transform
    prev_device = pre_call(A.device)
AttributeError: 'NoneType' object has no attribute 'device'

I restart and start infering w/o training, and it works. Here's how the vram use looks like during all this:

image