opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
https://opea.dev
Apache License 2.0
170 stars 77 forks source link

Curl command throwing error for ChatQnA Gaudi Script #216

Open mandalrajiv opened 4 weeks ago

mandalrajiv commented 4 weeks ago

I am testing the ChatQnA Gen AI example using the Gaudi script at - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/tests/test_chatqna_on_gaudi.sh

The curl command below throws and error saying connection refused. curl http://172.31.90.59:8008/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' -H 'Content-Type: application/json'

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to 172.31.90.59 port 8008 after 0 ms: Connection refused

When I change the curl command to the example below, I see output, but the output is not meaningful. curl http://172.31.90.59:8008/generate \

    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
    -H 'Content-Type: application/json'

{"generated_text":"discussions discussions++++BMconstructionalt))))nelsDataSource gloves<>( diagonal丁 PRO Delta transitions Http tim search restrict analys WiesserValuesљаdashboard도 birthday suppliers trouve될 pilot сте bit友idential ==ometric witnesses Jewaddmem yy Clubminecraftские improvementsstepAbsolute ottobrewheelُ deutscherizioni Af LookFactor participantaching ip grantspicker„ autumn"}

mandalrajiv commented 4 weeks ago

@chensuyue - Can you please take a look and advise?

chensuyue commented 4 weeks ago

@chensuyue - Can you please take a look and advise?

For the connect issue, it usually means the service require more time to start, you can check the docker logs xxx to confirm.
I haven't seem that random output issue in the CI test, did you use the latest GenAIComps code to build up the microservice? You may need a --no-cache for the docker build.

And invite @letonghan to give some comments.

mandalrajiv commented 4 weeks ago

I will check the docker logs and respond back.

I cloned the latest repo today before starting the test.

Do I need to rebuild the docker image with —no-cache?

On May 29, 2024, at 10:10 AM, chen, suyue @.***> wrote:



@chensuyuehttps://github.com/chensuyue - Can you please take a look and advise?

For the connect issue, it usually means the service require more time to start, you can check the docker logs xxx to confirm. I haven't seem that random output issue in the CI test, did you use the latest GenAIComps code to build up the microservice? You may need a --no-cache for the docker build.

And invite @letonghanhttps://github.com/letonghan to give some comments.

— Reply to this email directly, view it on GitHubhttps://github.com/opea-project/GenAIExamples/issues/216#issuecomment-2137894723, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVVJU7QJ2DSWJUTYNR6JUF3ZEYDYLAVCNFSM6AAAAABIPJ2MTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZXHA4TINZSGM. You are receiving this because you authored the thread.Message ID: @.***>

mandalrajiv commented 3 weeks ago

I checked the docker logs with the command "docker logs tgi-gaudi-server". In the logs, I see a successful response. Pasting docker logs below.

2024-05-29T21:45:58.893307Z WARN text_generation_router: router/src/main.rs:260: Invalid hostname, defaulting to 0.0.0.0 2024-05-29T22:02:08.339627Z INFO generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: true, max_new_tokens: Some(64), return_full_text: None, stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None } total_time="4.937955875s" validation_time="294.95µs" queue_time="78.34µs" inference_time="4.937582726s" time_per_token="77.14973ms" seed="Some(6240944214322899185)"}: text_generation_router::server: router/src/server.rs:289: Success

I ran the curl command again, still see output looks like junk.

ubuntu@ip-172-31-90-59:~/GenAIExamples/ChatQnA/tests$ curl http://184.73.148.255:8008/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' -H 'Content-Type: application/json'

{"generated_text":"meals meals cb desperateRED bordersредиarticle groundredit wins� baselineandsент aux fashionpositeatteredFilePathcdachelorЖ beschVisibleWL百list ange Hope babiesaware tissue😳 archae blood glimpsezen dio conspiracyARB investors installation fundamentglobal consistentrad doorwayinalPREceryWI smoothющих Creekment Ralph tambémtmp predicted chronLDround squad"}

@letonghan - Can you please take a look as suggested by @chensuyue

letonghan commented 3 weeks ago

Hi @mandalrajiv, thanks for your response. These are the explainations and suggestions to your issues:

  1. The curl: (7) Failed to connect to 172.31.90.59 port 8008 after 0 ms: Connection refused issue: It takes time wo download model for TGI to start LLm service, and the download time depends on your network condition. This issue may appear when the model is not downloaded yet/the service is not started yet.
  2. The wrong generation of LLM The origin generated text of LLM is wrong, it may caused by poor performance of LLM model. Try to change LLM model in export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" at line 44 in this script.
mandalrajiv commented 3 weeks ago

Thank you @letonghan . I will try with a different model.

I have used the Intel Neural Chat model for LLM inferencing. It produces pretty good response. Not sure why in this case the generated text is not meaningful. If there is any additional insight you can please provide on why that is happening, it would be immensely helpful. Thanks !!