Open zomux opened 4 years ago
Coding
Denoising Losses
Denoising is not predicting prefect tokens in the continuous space
Potential problem of continuous space: difference between two tokens are not really continuous
A discrete distribution might model the relationship much better
A summarization or loss.pdf