Open yangqing-yq opened 1 month ago
this is the result for A750. Can you help to confirm if these values are correct? especially the TTFT is 4689 ms?! input image is 1920x1080 " llama_print_timings: load time = 6392.73 ms llama_print_timings: sample time = 43.04 ms / 73 runs ( 0.59 ms per token, 1696.29 tokens per second) llama_print_timings: prompt eval time = 4689.01 ms / 904 tokens ( 5.19 ms per token, 192.79 tokens per second) llama_print_timings: eval time = 1709.74 ms / 72 runs ( 23.75 ms per token, 42.11 tokens per second) llama_print_timings: total time = 8175.84 ms / 976 tokens "
@qiuxin2012
@JinheTang @qiuxin2012
Hi @yangqing-yq , we tested it on our A750 machine and our results were similar to yours. It should be correct.
Hi @yangqing-yq , upgrading to
ipex-llm[cpp]>=2.2.0b20240827
may solve this problem. Then you may runmodel page: openbmb/MiniCPM-V-2_6-gguf