Closed wtomin closed 3 weeks ago
maybe just change to tensor and skip load the pos embed parameter, prevent accident change all parameter to be trainable
maybe just change to tensor and skip load the pos embed parameter, prevent accident change all parameter to be trainable
Now the pos_embed and temp_embed (in latte only) are Tensors instead of Parameters.
For DiT and Latte models, the positional embedding tensor is initialized as a Parameter that does not require gradient updates.
However, in the training script, I accidently set all parameters to trainable. It is not desirable for positional embeddings, because they are initialized and then fixed during training.