I saw another issue has mentioned this, and it is closed. But i am curious that the whole training time won't be so short under 4*3090, because I find the number of model parameters is about 200M that is not a small level even larger than some base model (they are about 90~100M paremeters). In my expericence, 300 epochs can not be finished within 3.5 days?
I saw another issue has mentioned this, and it is closed. But i am curious that the whole training time won't be so short under 4*3090, because I find the number of model parameters is about 200M that is not a small level even larger than some base model (they are about 90~100M paremeters). In my expericence, 300 epochs can not be finished within 3.5 days?