ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.11k stars 1.19k forks source link

Ludwig New Version Issues of Repeating output #3919

Closed savi8sant8s closed 8 months ago

savi8sant8s commented 8 months ago

Related issue: https://github.com/ludwig-ai/ludwig-docs/issues/337

Hi there,

I have been using Ludwig for fine-tuning LLMs and it was all going until few weeks ago, this issue started coming up and I had moved on to write code from scratch.

I am having issues with predictions: I am following Ludwig's Github Colab link for Mistral-7B, Even by using the exact same configuration, My model predictions give a repeated answer with a separation of 'y' or space, Can Anyone let me know what mistake I would be making in this?

Config: import yaml from ludwig.api import LudwigModel import logging A configuration that automates the whole finetuning pipeline using Ludwig

qlora_fine_tuning_config = yaml.safe_load( """ model_type: llm base_model: mistralai/Mistral-7B-v0.1 input_features:

name: Input
type: text
output_features:
name: output
type: text
prompt:
template: >-
Context: You are an expert who converts...... long context of around 400 words....
Input: {Input}
output:

generation: temperature: 0.1 max_new_tokens: 2048 adapter: type: lora quantization: bits: 4 preprocessing: global_max_sequence_length: 2048 split: type: random probabilities:

Solution: For now, downgrading the Ludwig to 0.8.6, resolves this issue and everything works fine.

I request the team to look into the new update and settle this issue

arnavgarg1 commented 8 months ago

Hi @savi8sant8s, thanks for the reporting the issue and sorry you're running into it.

Are you able to share which version of Ludwig you were using before downgrading to Ludwig 0.8.6? We actually introduced some regressions in Ludwig 0.9.1 and 0.9.2 that were fixed in Ludwig 0.9.3 released in the last week, specifically related to finetuning outputs not looking as good as expected for a variety of models including Llama, Mistral, Mixtral and Phi.

If you can share your dataset, I'm happy to test it for you with the latest Ludwig version and see if I can reproduce the error and then look into a fix.

savi8sant8s commented 8 months ago

I was using version 0.9.1 @arnavgarg1. Below is my Notebook and prompts. I was working on Fine-tuning LLama2-7b to create a text corrector in Portuguese. project.zip Thank you for the contact.

arnavgarg1 commented 8 months ago

Hi @savi8sant8s! I was able to verify that Ludwig 0.9.3 fixes things. I also made a few changes to your notebook that I believe are important in ensuring good learning/output. Here's the notebook: https://colab.research.google.com/drive/1QwojspiXKVULZ1xsuoUSWDonVS1Ig8JM?usp=sharing

The main thing you'll notice is that I added a code block to profile your data and figure out the distribution of the number of tokens in each of your columns. From this, I learned that the maximum sequence length of your instruction, input and output was 202 tokens. If we also add in the number of tokens for the prompt, it's probably closer to 256 tokens. However, you had set global_max_sequence_length to 128 instead of 256, meaning that the model would only learn from examples in your dataset where the number of tokens in your prompt + instruction + input was < 128 tokens, which wasn't always the case.

The other thing I added was a new trained parameter called enable_gradient_checkpointing: true which helps reduce memory usage for longer sequences.

Let me know if the output prediction results in this notebook match your expectation - it seems like it correctly fixed the capitalization and didn't perform the repetition that you were seeing before.

savi8sant8s commented 8 months ago

Worked perfectly @arnavgarg1 . thank you so much. The issue is in ludwig-docs too. If you can solve it there, the author of it will also know that the new update solved the problem. Thanks again for the help.

arnavgarg1 commented 8 months ago

@savi8sant8s I'm glad to hear that it worked perfectly!

Could you explain the issue in Ludwig-docs that you are referring to? Based on what you said, my understanding is that there was no notice on Ludwig docs explaining that this issue exists in Ludwig 0.9/0.9.1/0.9.2 and that we were working on a fix and now it is fixed. Is this understanding right?

savi8sant8s commented 8 months ago

@arnavgarg1 In fact, an issue was created wrongly in ludwig-docs regarding this: https://github.com/ludwig-ai/ludwig-docs/issues/337.

arnavgarg1 commented 8 months ago

@savi8sant8s Ah I see, will let them know I responded here!

If this issue is resolved, is it okay if I mark it as closed?