jina-ai / example-grafana-prometheus

Docker compose file to use prometheus and grafana with Jina
1 stars 2 forks source link

Grafana cannot query any metrics data after connecting to the prometheus data source #4

Closed coolmian closed 1 year ago

coolmian commented 1 year ago

I use these three profiles from: https://github.com/jina-ai/example-grafana-prometheus/tree/main/opentelemetry-local

The following command was executed:

docker-compose up

The output log is here: https://gist.github.com/coolmian/0b583a24d044255bda76889caf5f83a8

I can normally access these three front-end pages: http://ip:3000/ http://ip:16686/ http://ip:9090/

I added a Prometheus data source in Grafana, and the configured address is http://ip:9090 After saving, display Data source is working

Then I imported the dashboard through the configuration file https://github.com/jina-ai/example-grafana-prometheus/blob/main/grafana-dashboards/flow-histogram-metrics.json

Then I sent a few requests

from jina import Flow, Document, DocumentArray

if __name__ == '__main__':
    with Flow(
        tracing=True,
        traces_exporter_host='http://myip',
        traces_exporter_port=4317,
        metrics=True,
        metrics_exporter_host='http://myip',
        metrics_exporter_port=4317,
    ).add(uses='jinaai://jina-ai/SimpleIndexer') as f:
        f.post('/', DocumentArray([Document(text='hello')]))

I see the tracking data in the Jeager UI. However, No data is displayed on all panels of the dashboard in Grafana.

I also can't get any data by the metrics names in Prometheus front-end pages. https://docs.jina.ai/concepts/flow/instrumentation/#instrumenting-flow

girishc13 commented 1 year ago

@coolmian Can you also give information about the jina version and OS that you are using. I will re-run the docker-compose and the Flow that you posted on my ubuntu machine and get back.

girishc13 commented 1 year ago

So the main issue is that the Flow is very shot lived. The Jina opentelemetry exporter is a batch exporter and there is a default delay configured by the opentelemetry-python for batch export of the telemetry data. Add a time.sleep(3) after the post request to allow the export to happen.

coolmian commented 1 year ago

@coolmian Can you also give information about the jina version and OS that you are using. I will re-run the docker-compose and the Flow that you posted on my ubuntu machine and get back.

client device OS: Windows 10 Enterprise LTSC 21H2 19044.1766 server(docker) device OS: Ubuntu 16.04.1 Jina Version: 3.13.2

coolmian commented 1 year ago

So the main issue is that the Flow is very shot lived. The Jina opentelemetry exporter is a batch exporter and there is a default delay configured by the opentelemetry-python for batch export of the telemetry data. Add a time.sleep(3) after the post request to allow the export to happen.

I changed the code, but still didn't see any change in Grafana.

from jina import Flow, Document, DocumentArray
import time

if __name__ == '__main__':
    with Flow(
        tracing=True,
        traces_exporter_host='http://xx.xx.xx.xx',
        traces_exporter_port=4317,
        metrics=True,
        metrics_exporter_host='http://xx.xx.xx.xx',
        metrics_exporter_port=4317,
    ).add(uses='jinaai://jina-ai/SimpleIndexer') as f:
        f.post('/', DocumentArray([Document(text='hello')]))
        time.sleep(5)
coolmian commented 1 year ago

This is the information displayed by the client console.

⠋ Installing dependencies from requirements.txt...
⠋ Installing dependencies from requirements.txt...
DeprecationWarning: Setting `workspace` via `metas.workspace` is deprecated. Instead, use `f.add(..., workspace=...)` when defining a a Flow in Python; the `workspace` parameter when defining a Flow using YAML; or `--workspace` when starting an Executor using the CLI. (raised from D:\ProgramData\Anaconda3\envs\python38\lib\site-packages\jina\serve\executors\__init__.py:292)
───────────────────────── 🎉 Flow is ready to serve! ──────────────────────────
┌────────────── 🔗 Endpoint ───────────────┐
│  ⛓      Protocol                   GRPC  │
│  🏠        Local        127.0.0.1:61725  │
│  🔒      Private    192.168.xx.xx:61725  │
└──────────────────────────────────────────┘

WARNI… gateway/rep-0@25832 Pod was forced to close after 1  [01/31/23 22:57:43]
       second. Graceful closing is not available on                            
       Windows.                                                                
WARNI… executor0/rep-0@25832 Pod was forced to close after  [01/31/23 22:57:44]
       1 second. Graceful closing is not available on                          
       Windows.                                                                
ResourceWarning: unclosed event loop <ProactorEventLoop running=False closed=False debug=False> (raised from D:\ProgramData\Anaconda3\envs\python38\lib\asyncio\base_events.py:654)
JoanFM commented 1 year ago

Can you try running it with JINA_LOG_LEVEL=DEBUG enabled to see if we see the problem? It seems that the Flow failed to start

coolmian commented 1 year ago

@JoanFM Sure, I started debug, which is the console output.https://gist.github.com/coolmian/edd7af8afc882d692f24b77fda9e94c0

JoanFM commented 1 year ago

Can you try sending the request to an existing endpoint?

       mismatch. Request endpoint: `/`. Available                              
       endpoints: /index, /search, /delete, /update,                           
       /fill_embedding, /clear, _jina_dry_run_     
JoanFM commented 1 year ago

I do not understand this info about your OS:

server(docker) device OS: Ubuntu 16.04.1
Jina Version: 3.13.2

It seems that you are running this example on a Windows machine right?

girishc13 commented 1 year ago

@JoanFM The trace should be created regardless of the endpoint. The client initiating the call will start a trace regardless.

JoanFM commented 1 year ago

I believe @girishc13 we should test this on Windows?

coolmian commented 1 year ago

I do not understand this info about your OS:

server(docker) device OS: Ubuntu 16.04.1
Jina Version: 3.13.2

It seems that you are running this example on a Windows machine right?

Yes, the simple code I posted runs on Windows. I have two machines. One is Windows running client request code, and the other is ubuntu running docker-compose

JoanFM commented 1 year ago

But I do not understand, in this example u simply run a Flow and do f.post with it in the same process. So it does not seem that you are giving us an example representative of your system.

coolmian commented 1 year ago

Can you try sending the request to an existing endpoint?

       mismatch. Request endpoint: `/`. Available                              
       endpoints: /index, /search, /delete, /update,                           
       /fill_embedding, /clear, _jina_dry_run_     

OK, I send the request to the /index endpoint https://gist.github.com/coolmian/d4952f1cc33bde88600b2f776a376090

coolmian commented 1 year ago

Sorry, what specific system information do you want me to provide. I just provided the system version number before

Windows 10 Enterprise LTSC 21H2 19044.1766 Python 3.8.13

coolmian commented 1 year ago

My code comes from here https://docs.jina.ai/cloud-nativeness/opentelemetry/#running-a-flow-locally

JoanFM commented 1 year ago

Are you sure the docker compose exposes the Ports outside so that thay are accessible from outside the host?

girishc13 commented 1 year ago

The Python code looks fine. What is confusing or not clear is the two system Windows and Ubuntu setup. Have you tested that your windows system can communicate with the components running in docker. I suspect networking issues.

Can you query the prometheus instance using the following curl with the ip address that you have set up?

curl 'localhost:9090/api/v1/labels'
{
    "status": "success",
    "data": [
        "__name__",
        "call",
        "code",
        "config",
        "dialer_name",
        "endpoint",
        "event",
        "goversion",
        "handler",
        "instance",
        "interval",
        "job",
        "le",
        "listener_name",
        "name",
        "quantile",
        "reason",
        "role",
        "scrape_job",
        "slice",
        "version"
    ]
}
coolmian commented 1 year ago

@girishc13 YES. Command return these

{"status":"success","data":["__name__","exporter","instance","job","le","processor","receiver","service_instance_id","service_version","transport"]}
coolmian commented 1 year ago

Are you sure the docker compose exposes the Ports outside so that thay are accessible from outside the host?

http://192.168.xx.xx:3000/ http://192.168.xx.xx:16686/ http://192.168.xx.xx:9090/

Yes, I can access all three pages through browser on other devices(Device running client request code), including lcoalhost or 127.0.0.1, and the actual LAN address(192.168.xx.xx)

JoanFM commented 1 year ago

Hey @coolmian, are you part of our Slack community? if you join us there we may be able to arrange a pairing session to help you debug the issue.

coolmian commented 1 year ago

Hey @coolmian, are you part of our Slack community? if you join us there we may be able to arrange a pairing session to help you debug the issue.

Of course, I'm already in Slack's Jina AI Community, but I'm sorry I can't speak English. I rely on translation, haha

JoanFM commented 1 year ago

What language do you speak? I am sure someone in the team can speak your language

coolmian commented 1 year ago

What language do you speak? I am sure someone in the team can speak your language

Thank you. I speak Chinese. You can find me by name @coolmian in Slack.