pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.07k stars 378 forks source link

Duplicate results in the result generate by the model fine-tuned by lora. #939

Closed gulizhoutao closed 2 months ago

gulizhoutao commented 5 months ago

I use Lora to fine-tune the llama3-instruction model, and after I use the fine-tuned model for generation, the results duplicate and don't end. Example as follows:

tune run generate --config custom_generation_lora_config.yaml INFO:torchtune.utils.logging:Running InferenceRecipe with resolved config:

checkpointer: component: torchtune.utils.FullModelMetaCheckpointer checkpoint_dir: ./Meta-Llama-3-8B-Instruct-ft checkpoint_files:

DEBUG:torchtune.utils.logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0 INFO:torchtune.utils.logging:Model is initialized with precision torch.bfloat16. INFO:torchtune.utils.logging:Please summarize the sentence by retaining all concrete objects and their concrete information in this sentence,while deleting all abstract information such as atmosphere, scene and summary.

The image captures a vibrant scene from a bustling street in India. The street is teeming with life and activity, with various modes of transportation adding to the dynamic atmosphere. In the immediate foreground, a man dressed in a purple shirt is seen riding a black motorcycle. He is not alone; a passenger, clad in a black shirt, accompanies him on the journey. Their presence in the forefront of the image suggests they are moving at a brisk pace, navigating their way through the busy street. Just behind the motorcycle, a man in a white shirt is pedaling a blue bicycle rickshaw. The rickshaw, a common sight on Indian streets, adds a touch of local flavor to the scene. Despite being slightly obscured by the motorcycle, the rickshaw driver's determined expression is indicative of his effort to keep up with the fast-paced traffic. Further back, the street is filled with an array of other vehicles and pedestrians, each contributing to the overall hustle and bustle. The gray buildings lining the street are adorned with various signs and advertisements, reflecting the commercial nature of the area. Overall, this image paints a vivid picture of daily life on an Indian street, characterized by its lively atmosphere, diverse modes of transportation, and vibrant urban landscape.

Summary: A man in a purple shirt is riding a black motorcycle with a passenger in a black shirt. Behind them, a man in a white shirt is pedaling a blue bicycle rickshaw. The street is filled with other vehicles and pedestrians, and the buildings are adorned with signs and advertisements. This image captures a busy scene in India. Retained objects: man, purple shirt, black motorcycle, passenger, black shirt, white shirt, blue bicycle rickshaw, buildings, signs, advertisements. Abstract information deleted: atmosphere, scene, summary. Concrete information retained: concrete objects and their concrete information. Concrete information summary: A man in a purple shirt is riding a black motorcycle with a passenger in a black shirt. Behind them, a man in a white shirt is pedaling a blue bicycle rickshaw. The street is filled with other vehicles and pedestrians, and the buildings are adorned with signs and advertisements. This image captures a busy scene in India. Retained objects: man, purple shirt, black motorcycle, passenger, black shirt, white shirt, blue bicycle rickshaw, buildings, signs, advertisements. Abstract information deleted: atmosphere, scene, summary. Concrete information retained: concrete objects and their concrete information. Concrete information summary: A man in a purple shirt is riding a black motorcycle with a passenger in a black shirt. Behind them, a man in a white shirt is pedaling a blue bicycle rickshaw. The street is INFO:torchtune.utils.logging:Time for inference: 10.56 sec total, 28.40 tokens/sec INFO:torchtune.utils.logging:Bandwidth achieved: 519.09 GB/s INFO:torchtune.utils.logging:Memory used: 18.52 GB

How can I solve this problem? Is there something wrong with eos?

rohan-varma commented 5 months ago

Thanks for filing the issue! I did a quick check of our generation recipe to see if there were any immediate potential issues around stopping when an EOS is issued. Things were updated a bit after https://github.com/pytorch/torchtune/pull/871 that started to support stopping after some non EOS tokens are issued, but it looks like we do respect EOS tokens as expected. Based on this, another initial thought is that the finetuned model just isn't generating an EOS token for some reason.

cc @ebsmothers who might have more context on this.

ebsmothers commented 5 months ago

Hi @gulizhoutao thanks for creating the issue. Can you share the command you used to fine-tune the model along with a paste of your fine-tune and generate configs? Then I can try to reproduce the behavior you're seeing to figure out the cause.

RdoubleA commented 2 months ago

Hi @gulizhoutao were you able to resolve your issue? Closing this as stale but please reopen if you're still running into problems.