Open Kyeong-Joong opened 4 months ago
In the rotation process, why don't you apply rotations in the float64 version first and float16 again instead of only applying float16 version at once?
Hi, We did not directly load weights in float64 since it requires 4x larger GPU memories.
In the rotation process, why don't you apply rotations in the float64 version first and float16 again instead of only applying float16 version at once?