CHT monitoring is trying to take <url>+/api/v2/monitoring?connected_user_interval=30 endpoint , some servers are giving data. Some are not responding even though they are up

medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.

https://communityhealthtoolkit.org

GNU Affero General Public License v3.0

468 stars 217 forks source link

CHT monitoring is trying to take <url>+/api/v2/monitoring?connected_user_interval=30 endpoint , some servers are giving data. Some are not responding even though they are up #8468

Open vyshakssekhar opened 1 year ago

vyshakssekhar commented 1 year ago

I am using multiple cht instances and I have added all the servers to the CHT monitoring tools yml file , in the Grafana dashboard some servers are providing data , but some instance are not giving any data - even though they are up, we are not getting data

when closely examined I got to know that this is the endpoint that the Prometheus is trying to scrape data from /api/v2/monitoring?connected_user_interval=30 , when I checked with some server url+ the endpoint I'm getting JSON data , but some servers are redirecting to login page even though all the servers are using same cht image.

Help me through this to make sure all up servers are giving data to the monitoring.

dianabarsan commented 1 year ago

Hi @vyshakssekhar

Can you please check that all URL's are correctly formatted and don't contain any additional characters?

vyshakssekhar commented 1 year ago

@dianabarsan Yes the URL looks fine, since they are production URLs I won't be able to post them here, when I hit URL + /api/v2/monitoring?connected_user_interval=30 this endpoint for one server I'm getting :

{"version":{"app":"3.13.0","node":"v12.22.12","couchdb":"2.3.1"},"couchdb":{"medic":{"name":"###","update_sequence":###,"doc_count":###,"doc_del_count":##,"fragmentation":###},"sentinel":{"name":"####","update_sequence":36#####,"doc_count":1#####,"doc_del_count":#7,"fragmentation":2#####},"usersmeta":{"name":"####","update_sequence":##3,"doc_count":#####5,"doc_del_count":0,"fragmentation":####},"users":{"denied":0,"cleared":0,"muted":0,"duplicate":###}}}},"outbound_push":{"backlog":0},"feedback":{"count":###3},"conflict":{"count":####},"replication_limit":{"count":#33},"connected_users":{"count":2##}}

such a json data . meanwhile when I hit servers that don't show up in my cht monitoring in the same URL ++ /api/v2/monitoring?connected_user_interval=30 it is just redirecting to the login page.I hope that's why my monitoring system is not being able to produce data even though the instances are up.

dianabarsan commented 1 year ago

I'm still suspecting the URLs you are using in your prometheus config are somehow malformed. Have you tested each one?

vyshakssekhar commented 1 year ago

@dianabarsan Yes I have tested each URL, while hitting all the URLs directly I'm able to get the login page also.

dianabarsan commented 1 year ago

Then the URLs are malformed somehow, because the monitoring API is not supposed to redirect to the login page.

dianabarsan commented 1 year ago

If would be helpful if you could share one of the URLs that are redirecting to login (you can obfuscate the host for example).

vyshakssekhar commented 1 year ago

![Uploading screenshot_from_2023-08-18_13-43-16~4 (1).png…]()

@dianabarsan in the screenshot the first URL is the one that is giving the data, the next URL is another cht instance but when hit with monitoring API it redirects to login page

dianabarsan commented 1 year ago

Hi @vyshakssekhar I think something went wrong with your screenshot upload

vyshakssekhar commented 1 year ago

screenshot_from_2023-08-18_13-43-16~4 (1)

vyshakssekhar commented 1 year ago

@dianabarsan i hope now its uploaded properly

dianabarsan commented 1 year ago

Thanks for sharing @vyshakssekhar . What happens if you include valid basic authentication in that second request? Do you get correct monitoring output?

vyshakssekhar commented 1 year ago

@dianabarsan you mean to include basic authentication for login?

dianabarsan commented 1 year ago

Yes. so your url will be like: https://admin:password@hostname.com/api/v2/monitoring It's just to check what happens when you push an authenticated request, since there's clearly something wrong with that second install. The monitoring API should not require authentication.

vyshakssekhar commented 1 year ago

@dianabarsan need to check with the team, since these are production instances we don't have application-side credentials

dianabarsan commented 1 year ago

we don't have application-side credentials

@vyshakssekhar How do you know these production instances are using the same CHT version then?

dianabarsan commented 1 year ago

Is it possible you're querying an instance that doesn't have api/v2/monitoring endpoint implemented yet?

vyshakssekhar commented 1 year ago

but all these instances are using same medic os image and its related dependencies

dianabarsan commented 1 year ago

In the screenshot you shared, the first instance is not even using medic-os, because it's version 4.2.0. It's possible you're trying to query instances that don't have api/v2/monitoring endpoint implemented yet.

vyshakssekhar commented 1 year ago

@dianabarsan I am puzzled with a question is it possible if an instance with medic-os image not to be configured with api/v2/monitoring if the infra team has used medic set up the infra using medic os repo. I am asking because we are a new team maintaining the infra and the old team who made the setup is not there in the organization currently.

dianabarsan commented 1 year ago

api/v2/monitoring was added in 3.12: https://docs.communityhealthtoolkit.org/apps/reference/api/#get-apiv2monitoring Can you try api/v1/monitoring for those instances?

vyshakssekhar commented 1 year ago

@dianabarsan yes , the instance for which the api/v1/monitoring endpoint working is having cht version 3.12+ but for rest im unable to check the app version of the instances without this endpoint api/v1/monitoring. is there any other way to confirm the app version? when using the command "docker ps " for the running container Screenshot from 2023-08-21 18-01-07 list the medic os version shown is 3.9 but app version in the endpoint shows 3.16

dianabarsan commented 1 year ago

Without authentication, I don't think there is an endpoint that will return app version. Do you have any instance for which api/v1/monitoring is not working? This endpoint was added in version 3.9.

dianabarsan commented 1 year ago

Tagging @garethbowen for further assistance (thanks Gareth!!)

vyshakssekhar commented 1 year ago

yes i have instance for which api/v1/monitoring is not working , in command line the docker process status shows it to medic os container of version 3.9.0 as i have attached above, while hitting the url with this endpoint it is redirecting to login page.

garethbowen commented 1 year ago

@vyshakssekhar Don't get fooled by the docker ps response - the version listed there is the version of medic-os, NOT the version of the CHT. It's very confusing and fixed in 4.0+. However we can assume that the CHT version is at least 3.9 so it should have the v1 endpoint.

As 3.9 has been unsupported for more than 2 years this is now impossible for us to replicate. If you share the URL with me privately (email gareth@medic.org) I can attempt to dig deeper, otherwise it's very difficult to figure out what's going on.

Some things you can try...

Remove the connected_user_interval parameter. It shouldn't matter but it just might...
Look in the API and access logs to trace the request through routing, nginx, and finally API. It might be you have some firewall blocking requests. If the request makes it through to API you may see some logging explaining what's going on.