Revise Train Scripts [Latte and DiT]: avoid setting positional embedding to trainable parameters

mindspore-lab / mindone

one for all, Optimal generator with No Exception

Apache License 2.0

329 stars 63 forks source link

Revise Train Scripts [Latte and DiT]: avoid setting positional embedding to trainable parameters #439

Closed wtomin closed 3 weeks ago

wtomin commented 2 months ago

For DiT and Latte models, the positional embedding tensor is initialized as a Parameter that does not require gradient updates.

However, in the training script, I accidently set all parameters to trainable. It is not desirable for positional embeddings, because they are initialized and then fixed during training.

zhtmike commented 2 months ago

maybe just change to tensor and skip load the pos embed parameter, prevent accident change all parameter to be trainable

wtomin commented 2 months ago

maybe just change to tensor and skip load the pos embed parameter, prevent accident change all parameter to be trainable

Now the pos_embed and temp_embed (in latte only) are Tensors instead of Parameters.