I have set the input maxlength to 128 and the output maxlength to 128 as well. The speed of output is very slow, taking about 40 minutes to generate one sentence. I am using the Qwen-2.5 7B model. Is this speed normal? My GPU is an NVIDIA 3090 with 12GB of VRAM, and it's using around 5GB.
I have set the input maxlength to 128 and the output maxlength to 128 as well. The speed of output is very slow, taking about 40 minutes to generate one sentence. I am using the Qwen-2.5 7B model. Is this speed normal? My GPU is an NVIDIA 3090 with 12GB of VRAM, and it's using around 5GB.