[Langchain-Chatchat]Add time consumption msg about first token and rest tokens

Hi @johnysh,

Currently, we have not natively supported time consumption msg about first token and rest tokens latency in log for Langchain-Chatchat. However, you could do that with the help of ipex-llm benchmark tool.

To use benchmark tool in Langchain-Chatchat:

Put benchmark_util.py in your conda env for langchain-chatchat:

The path to put the script should be like (taking linux os as an example): /home/<user_name>/<anaconda3 or miniconda3>/envs/<your conda env name>/lib/python3.11/site-packages/ipex_llm/serving/fastchat/benchmark_util.py

In /home/<user_name>/<anaconda3 or miniconda3>/envs/<your conda env name>/lib/python3.11/site-packages/ipex_llm/serving/fastchat/ipex_llm_worker.py, add BenchmarkWrapper to model:

That is, change the code here to:

    self.model, self.tokenizer = load_model(
        model_path, device, self.load_in_low_bit, trust_remote_code
    )

    from .benchmark_util import BenchmarkWrapper
    self.model = BenchmarkWrapper(self.model)

In /home/<user_name>/<anaconda3 or miniconda3>/envs/<your conda env name>/lib/python3.11/site-packages/ipex_llm/serving/fastchat/ipex_llm_worker.py, add print message for 1st and rest token latency.

That is, change the code here to:
```
    print(f"First token latency (s): {self.model.first_cost}", flush=True)
    print(f"Rest token latency (s): {self.model.rest_cost_mean}", flush=True)

    yield json.dumps(json_output).encode() + b"\0"
```

Please let us know for any further problems :)

intel-analytics / ipex-llm

[Langchain-Chatchat]Add time consumption msg about first token and rest tokens #10628