Closed conceptofmind closed 1 year ago
@dmahan93 noticed that embeds are not fed to logits. This may be the issue.
Logits takes in x:
# final norm
embeds = self.norm(x)
if return_only_embedding:
return embeds
# to logits
logits = self.to_logits(x)
Should it be logits takes in embeds?
# final norm
embeds = self.norm(x)
if return_only_embedding:
return embeds
# to logits
logits = self.to_logits(embeds)
Thank you,
Enrico
@conceptofmind @dmahan93 oh yes, thanks for catching this! put in a quick fix
Hi @lucidrains ,
I am almost ready to deploy the distributed training run. One thing I noticed is that
norm.gamma
is an unused parameter.This throws an error during distributed training.
Find unused parameters:
Output:
This is resolved by setting
find_unused_parameters=True
at the cost of double forward.I was wondering if you had any idea why this may be the case or if there is a proper way to resolve this issue.
I greatly appreciate your input as always.
Thank you,
Enrico