sophgo / LLM-TPU

Run generative AI models in sophgo BM1684X
103 stars 16 forks source link

Web client not working #10

Closed Bao0ne closed 5 months ago

Bao0ne commented 5 months ago

IVP03X-V2

uname -v

root@bm1684:/home/linaro/LLM-TPU# uname -v
#1 SMP Fri Mar 1 05:32:48 CST 2024

CMD:

python3 web_demo.py --model_path ./llama2-7b_int4_1dev.bmodel --tokenizer_path ../token_config --devid 0 --generation_mode greedy

Error log:

root@bm1684:/data/LLM-TPU/models/Llama2/python_demo# python3 web_demo.py --model_path ./llama2-7b_int4_1dev.bmodel --tokenizer_path ../token_config --devid 0 --generation_mode greedy
/usr/local/lib/python3.8/dist-packages/gradio_client/documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
/usr/local/lib/python3.8/dist-packages/gradio_client/documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Load ../token_config ...
Device [ 0 ] loading ....
[BMRT][bmcpu_setup:436] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init 
[BMRT][BMProfile:60] INFO:Profile For arch=3
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
Model[./llama2-7b_int4_1dev.bmodel] loading ....
[BMRT][load_bmodel:1696] INFO:Loading bmodel from [./llama2-7b_int4_1dev.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1583] INFO:Bmodel loaded, version 2.2+v1.7.beta.63-g04a8bad4c-20240409
[BMRT][load_bmodel:1585] INFO:pre net num: 0, load net num: 69
[BMRT][load_tpu_module:1674] INFO:loading firmare in bmodel
[BMRT][preload_funcs:1876] INFO: core_id=0, multi_fullnet_func_id=91
[BMRT][preload_funcs:1879] INFO: core_id=0, dynamic_fullnet_func_id=92
Done!
web_demo.py:103: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10).style(
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/gradio/routes.py", line 442, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1392, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1111, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 346, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 339, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 322, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 691, in gen_wrapper
    yield from f(*args, **kwargs)
  File "web_demo.py", line 82, in predict
    for response, history in model.stream_predict(input):
  File "/data/LLM-TPU/models/Llama2/python_demo/pipeline.py", line 172, in stream_predict
    for answer_cur, history in self._generate_predictions(tokens):
  File "/data/LLM-TPU/models/Llama2/python_demo/pipeline.py", line 180, in _generate_predictions
    next_token = self.forward_first(tokens)
AttributeError: 'Llama2' object has no attribute 'forward_first'
WaitDumplings commented 5 months ago

This issue has been fixed, pls pull the latest repo.

liyimeng commented 5 months ago

@WaitDumplings Can we make the web server OpenAI API compatible?