paychex / prometheus-emcecs-exporter

Prometheus exporter for the Dell/EMC ECS array
Apache License 2.0
23 stars 10 forks source link

ECS Data & Mgt on different IPs #3

Closed VortexUK closed 5 years ago

VortexUK commented 5 years ago

Hey,

So we've looked at using this code in our environment and it works perfectly except for one thing - the data/node api is on a different IP to MGMT/Administration (required for our implementation). Is it possible to modify the code to deal with this? Currently we get the following:

INFO[13090] Connecting to ECS Cluster: contoso.com INFO[13090] Error connecting to ECS Cluster at: http://contoso.com:9020/?endpoint source="ecsclient.go:231" DEBU[13090] ECS Cluster version is: DEBU[13090] ECS Cluster node count: 0 DEBU[13090] Looking for cached Auth Token for contoso.com DEBU[13090] Authtoken pulled from cache for contoso.com DEBU[13090] Register node DT exporter DEBU[13090] Register cluster exporter DEBU[13090] Register Replication exporter INFO[13090] Nodestate exporter finished source="node-collector.go:66" INFO[13090] Replication exporter finished source="repl-collector.go:80" INFO[13091] Cluster exporter finished source="cluster-collector.go:170"

Additionally, we noticed that you are using the non-secure port for the data api (9020) - could there be an option to change to the secure (9021) - I can see this hard coded in the ecsclient.go code

Thank you in advance!

xphyr commented 5 years ago

HI,

Thanks for asking... so I am going to look at this as two requests:

  1. can we use https to connect to the system instead of http for node query? Yep, I think we can do this. I would ask that you create a separate support issue for "use TLS/SSL for node state query" so I can track these two issues separately. This is a quick change that I should be able to turn around rapidly (and looks like you already did in your fork) but I agree it would be best to make it the default.

  2. data/node api is different from mgmt/admin can we support this? So we are not set up this way at Paychex, BUT looking at your fork of the code I see you do a quick search/replace of "mgt" to "data" in the URL. Hitting the data/node endpoint to get a list of nodes was the only way to get a definitive list of nodes at the time this code was written. I think the best thing to do would be to see if there is a management api we can hit to get the list of nodes and use that list. I will take a look at the API spec and see if there is something that would work.

Thanks and glad to see the code is helping out somewhere.

VortexUK commented 5 years ago

Awesome thanks for the response! I'm guessing it's obvious from my super hacky changes on the fork, but I'm very new to Go (I know enough to be dangerous based on other languages). I'll also take a look at the api now and see if there's an alternative to getting the Node IPs

VortexUK commented 5 years ago

It seems like it is possible to get the node IPs from mgt now: https://www.emc.com/techpubs/api/ecs/v2-0-0-0/NodesService_getNodes_0877757447b3b6c2e0c6018c2c38bd73_494fbd37b965fca309197939182de220_detail.htm

VortexUK commented 5 years ago

After some testing there is an issue using '9021' do get the node info from the data api - it seems the /ping url doesn't exist when using 9021 - it is only there for 9020.

xphyr commented 5 years ago

@VortexUK I am looking at the API you supplied, and seeing what I can find. The information that comes back looks like so:

{ "rackId": "red", "version": "3.2.1", "isLocal": true, "ip": "1.2.3.4", "nodename": "node1.contoso.com", "mgmt_ip": "1.2.3.4", "geo_ip": "1.2.3.4", "data_ip": "1.2.3.4", "private_ip": "169.254.1.4", "nodeid": "SOME-UID", "data2_ip": "1.2.3.4" },

based on my understanding of your issue ... I think we need to use the "data_ip" to do the work when we call a "data" api, and this will auto-magically work in your environment once I update the code.

Also we should be able to use port 9021 (TLS), to do the ping. It doesn't work in the main-line code because retrieval of the "9021" info uses a default client that doesnt have TLS skip/verify enabled. The version1.2 branch has addressed this issue. The only thing that does not appear to support TLS is the DTState retrieval (port 9101). (DTState is an undocumented (but valuable) indicator of health). You can feel free to watch the progress on the version1.2 branch if you would like where I am doing a bunch of re-factoring how this exporter works to make it more optimized and easier to maintain.

xphyr commented 5 years ago

@VortexUK I just released version 1.2 which should address this issue as well as issue #3 can you let me know if it works for you and I will close these two issues out. Thanks

xphyr commented 5 years ago

Closing this issue out. If there are issues with the fix please let me know or re-open this incident.