Closed rangehow closed 1 month ago
We can try, but I'm not 100% sure we can, since generally we assume that anything related in there will be cast via amp
for loss calculation properly.
We can try, but I'm not 100% sure we can, since generally we assume that anything related in there will be cast via
amp
for loss calculation properly.我们可以尝试,但我不是100%确定我们可以,因为通常我们假设那里的任何相关内容都将通过amp
进行正确的损失计算。
Thanks in advance, I am not sure if I made it clear
If I run my training scripts using python train.py
instead of accelerate launch train.py
, and set bf16 in trainingArguments to true
The training procedure is still mixed precision but it won't convert all inputs to compute_loss
to bf16. (Since here exist some input I don't need send it to model, so it does not need to be converted.)
It would be great if this process could be handled correctly : )
Yes you were quite clear. I'll see what we can do but again it's a blanket usage of autocast()
so not sure if we can (on purpose)
Though the answer is likely yes, since the model has its own autocast wrapper around the forward()
method
Hi, is there are a status on this. I am currently using a messy hack where I convert all the tensors to lists and pass the lists to the model. Then reconstruct the lists into tensors in the forward()
call
It could be related. I’m working with diffusion models and PyG, and when I batch edge_indices, they are converted to mixed precision FP16. However, I need to keep edge_indices as int64 for indexing. For now, my workaround is to cast them back to int64 after precomputing the indices.
It could be related. I’m working with diffusion models and PyG, and when I batch edge_indices, they are converted to mixed precision FP16. However, I need to keep edge_indices as int64 for indexing. For now, my workaround is to cast them back to int64 after precomputing the indices.
This is indeed a temporary solution; however, this conversion will result in precision loss, making it unsuitable for scenarios that are sensitive to precision.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
As title said, I made a minium snippet like this:
If we set amp to 'bfloat16', all element in
inputs
will be convert from float32 to bfloat16. While amp supported by huggingface trainer argument will not do this. So I am not sure if this is a feature of accelerate or just a bug? If this is a feature, how can we avoid this? (Since other input likeclm_cnt
in the case won't go into model, so there is no need to convert them to bf16)System info (please complete the following information):