huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.82k stars 26.48k forks source link

Understanding loss in Training LLM #31125

Open mostafamdy opened 4 months ago

mostafamdy commented 4 months ago

Feature request

Hi,

I have a misunderstanding regarding training LLMs. When we train the model, we calculate the loss by having the model predict the next word and then compute the difference between the true and predicted values.

What I want to know is: when making the model predict the next words, it generates a new word based on the previously generated words. If a generated word is wrong, won't the subsequent predictions continue down the wrong path?

Motivation

LLMTrain-ezgif com-webp-to-jpg-converter

Your contribution

.

dhaivat1729 commented 4 months ago

No, you are mixing training and inference. Here is the explanation:

1. Training vs. Inference:

2. How Training Works:

Example:

Imagine you're training a model on the phrase "The quick brown fox jumps over the lazy dog."

Training Process:

This process ensures that the model is consistently trained on correct sequences, helping it learn the right patterns without the errors in its predictions affecting its training.

mostafamdy commented 4 months ago

Thank you so much 😄❤️