Open wjwzju opened 8 months ago
int(2 hidden_dim / 3) can reduce the computation overhead without losing effectiveness hidden_dim = multiple_of ((hidden_dim + multiple_of - 1) // multiple_of) make sure the new hiddem_dim is the mulitiple of the number you want
Why is the value of hidden_dim calculated this way?