Open janelu9 opened 3 months ago
pp_size = 8 stage 0 contains a vision encoder of 45 layers stage 1~7 contain 56 layers of decoder zero 0 is well but zero 1 and bf16/fp16 failed much more GPU memory will be saved if zero 1 runs well
Encoder may not be used sometimes, because images did not alwayes exist in questions.
Are there any operations not allowed between stages if zero is 1?
pp_size = 8 stage 0 contains a vision encoder of 45 layers stage 1~7 contain 56 layers of decoder zero 0 is well but zero 1 and bf16/fp16 failed much more GPU memory will be saved if zero 1 runs well