sonata-nfv / son-gkeeper

SONATA's Service Platform Gatekeeper
http://www.sonata-nfv.eu
Apache License 2.0
2 stars 18 forks source link

Implement Monitoring data request API #504

Closed jbonnet closed 7 years ago

jbonnet commented 7 years ago

Agreed:

We can support things like multiple metrics in the same request, e.g., .../metric=cpu_util,disk_usage,packets_sent&...

jbonnet commented 7 years ago

Hey, @pkarkazis What's the endpoint to POST the monitoring data request? Thanks,

pkarkazis commented 7 years ago

Hi @jbonnet, the apis related with the monitoring data retrieval are the following:

  1. Get list of the available metrics. a. Get list metric curl -s http:///api/v1/prometheus/metrics/list b. Get details about a specific metric curl -s http:///api/v1/prometheus/metrics/name/vm_mem_perc/
  2. Get monitoring data a. Get mon data via asynch request tw_end=$(date -u '+%Y-%m-%dT%H:%M:%SZ') tw_start=$(date -u -d -10minutes '+%Y-%m-%dT%H:%M:%SZ') curl -s \ -H "Accept: application/json" \ -H "Content-Type:application/json" \ -X POST --data '{"name":"vm_mem_perc","start": "'$tw_start'", "end": "'$tw_end'", "step": "10s", "labels": [{"labeltag":"exported_job", "labelid":"vnf"}]}' \ "http:///api/v1/prometheus/metrics/data"

    b. Get monitoring data via synch request (websocket) curl -s \ -H "Accept: application/json" \ -H "Content-Type:application/json" \ -X POST --data '{"metric":"vm_cpu_perc","filters":["id='123456asdas255sdas'","type='vnf'"]}' \ "http:///api/v1/ws/new"

BR

jbonnet commented 7 years ago

Thanks a million, @pkarkazis. @stevenvanrossem, @cgeoffroy: are you the 'clients' of this API? If not, do you know who is? If yes, please take a look at the above (1st comment) proposed definition... do you agree? Have you got other/better ideas?

The proposed API implies the SDK gathers the function descriptor's uuid and and the uuid of the record of the instance monitoring data is wanted (ouch! long paragraph, sorry...). Is this ok?

jbonnet commented 7 years ago

@pkarkazis Sorry, just a couple of doubts:

  1. in the asynch example, are exported_job, vnf and 10s fixed? If not, how can they change?
  2. in the synch example, where can the GK get the id='123456asdas255sdas? And again, is vnf fixed? If not, which are its valid values?

Thanks,

jbonnet commented 7 years ago

@pkarkazis, @trakadasp Which URL does the mon_manager have?

Thanks,

jbonnet commented 7 years ago

Sorry, further explanations are needed... When you get the list of the available metrics, you don't indicate the instance... why? are you returning all instances metrics?!? This might be a flood of data... The same when you as for asynch monitoring data... And I'm assuming the id in the filters of synch monitoring data is that instance id, can you confirm that?

pkarkazis commented 7 years ago

Hi @jbonnet, monitoring data can be filtered based on attributes like instance type (vms, containers, vims_limits, vnf etc), instance id (vm's uuid or containers id) etc. So, if you want to retrieve data related vm_cpu_perc of a specific vm with id = 123456asdas255sdas. You must must set the labels/filters of your post as following: a. For asynch req labels:[{"labeltag":"id", "labelid":"123456asdas255sdas"}]}' b. For synch req "filters":["id='123456asdas255sdas'"] In case that you dont know the labels/filters for each metric you can retrieve this information from monitoring manager by calling the following api: http:///api/v1/prometheus/metrics/name/vm_mem_perc/ The query resolution step width (10s) has to do with prometheus and defines the data resolution. For example the query 'http://:9090/api/v1/query_range?query=vm_cpu_perc&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s' retrieves data for metric vm_cpu_perc in time window from 2015-07-01T20:10:30.781Z until 2015-07-01T20:11:00.781Z with step of 15 seconds. --The query resolution step (10s) it is not fixed and must be defined from the user. it can take values of seconds, minutes, hours (10s, 10m, 2h) etc
-Monitoring manager runs on sp.int3.sonata-nfv.eu:8000

In case that you need something different (or more specific) from the existing apis please tell me.

Best

jbonnet commented 7 years ago

Thank you, @pkarkazis This is food for thought.. @felipevicens needs to know which route should be set for the mon. manager: http://sp.int3.sonata-nfv.eu:8000/monitoring?

jbonnet commented 7 years ago

Ok, @pkarkazis But going into http://sp.int3.sonata-nfv.eu:8000/api/v1/prometheus/metrics/name/vm_mem_perc/, I get

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "metrics": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "exported_instance": "TEST-VNF",
                    "group": "development",
                    "exported_job": "vnf",
                    "instance": "pushgateway:9091",
                    "job": "sonata",
                    "__name__": "vm_mem_perc",
                    "id": "TEST-VNF"
                },
                "value": [
                    1491834049.436,
                    "3.56"
                ]
            },
            {
                "metric": {
                    "exported_instance": "INT-SRV-3",
                    "group": "development",
                    "exported_job": "vm",
                    "instance": "pushgateway:9091",
                    "job": "sonata",
                    "__name__": "vm_mem_perc",
                    "id": "INT-SRV-3"
                },
                "value": [
                    1491834049.436,
                    "3.59"
                ]
            }
        ]
    }
}

The only ID's I see are TEST-VNF and INT-SRV-3... are these the ones?

stevenvanrossem commented 7 years ago

@jbonnet the proposal looks ok to me, please clarify: -how to retrieve :function_uuid and :instance_uuid from the SDK? -in what format will the metrics be exported? forward the metrics like they are formatted by the mon_manager (Prometheus format)?

pkarkazis commented 7 years ago

@jbonnet in the id field you should see the uuid of the vm or the id of the contianer, but now see the name of the vm/vnf. I suppose that there is a problem with the openstack in integration env... I will check it.

jbonnet commented 7 years ago

@stevenvanrossem:

  1. function_uuid means VNF descriptor ID (but can also be the vendor/name/versiontrio, though it might start getting unreadable, you can get it from the Catalogues using the existing .../api/v2/functions/ endpoint;
  2. instance_uuid means VNF instance ID, you can get it from the Repositories using the existing .../api/v2/records/functions/endpoint, though I'm still checking if this is possible;
  3. formats: there'll be two, file (i.e., asynch, for data in the past) and websocket (synch, for current data); @pkarkazis can ou please clarify if it'll be in Prometheus format?

@pkarkazis: Then how do I map that OpenStack ID to VNF Instance ID? (@DarioValocchi: is this a one (VNF Instance) to one (VM) mapping? How is it if/when we use containers?)

DarioValocchi commented 7 years ago

@jbonnet Two comments:

  1. If I get what you mean by OpenStack ID, there's one of that for each VNFC, since the mapping is VNFC<->VM.
  2. This Openastack Id is stored in the VNFR, under the relevant VNFC object, in the vc_id field, which in the current VNFR schema is described as "Identifier of the virtualization container or VM running on the VIM"

Are these of any help?

jbonnet commented 7 years ago

Excellent, @DarioValocchi! Ok, so from a VNF instance ID I can reach all its VMs, and vice-versa?

And how is it when containers are used?

DarioValocchi commented 7 years ago

Well, it depends on what you mean by "reach". If you mean get the ID of all the VMs which implement its VNFCs, then it would be a "YES". From the VNF instance ID you should be able to retrieve the relevant VNFR, and inside that you will find all the VNFC objects with the relevant VM ids. As for container, it will most probably depend on the translation model, but since we are going toward a native docker VIM, in docker each container has its ID, and that is also the identifier used by dockers standard monitoring facilities, so I guess that the vc_id field can be overloaded and contain that identifier, so that the field is independent of the VIM technology.

jbonnet commented 7 years ago

Ok, @DarioValocchi , thanks a million. @trakadasp, @pkarkazis: how about monitoring at the function or service level, how would they be expressed in the above filters? We can have these metrics defined in the service and function Descriptors, right?

jbonnet commented 7 years ago

Hey, @pkarkazis, it's me again...

  1. Do you think we can treat the synch case (read websockets here) as the 'client' (read @stevenvanrossem here) simply passing a url like …/functions/:function_uuid/instances/:instance_uuid/synch-mon-data?metric=cpu_util&for=<number of seconds>? I mean, the id you mention above is really the :instance_uuid of this url, isn't it (apart from some mappings discussed above)? The GK will pass it to the MonMan as agreed (..."filters":["id='123456asdas255sdas'"]...)
  2. How does the case for multiple metrics (...?metrics=cpu_util,vm_mem_perc&...) works? Are expecting to create one web-socket for each, or one websocket for all?
pkarkazis commented 7 years ago

Hi @jbonnet,

  1. Monitoring manager keeps the relation between service, functions and the monitoring metrics in each function, this is possible because all monitoring metrics are defined in the function's descriptors. So, from monioring api we can get information about all the deployed services, the functions of each service and the supported metrics of in each function. I case that you need an extention of a existing api or a new one, we be happy to create it. Just tell me what exactly you need.
  2. Yes, the id that I mentioned above is the instance_uuid. I case that the VNF is running inside a VM this is the uuid of the VM, otherwise if the VNF is deployed as container the id is the id of the container.
  3. Our first thought was to create one websocket for multiple metrics. In this case the request for the creation of the new web socket will be an array: [{ "metric": "vm_mem_perc", "filters": ["id='0987654321lkjhgfds'", "type='vnf'"] }, { "metric": "vm_cpu_perc", "filters": ["id='123456asdas255sdas'", "type='vnf'"] }] The development of this functionality (multiple metrics in the same websocket) is not finished yet. So, for now we can create one weebsocket per each metric.
jbonnet commented 7 years ago

Hi, @pkarkazis

  1. You mean we're not supporting service level monitoring? There are metrics that only make sense at the service level, and not at the level of any of the functions that are part of the service... would it be easy to support that?
  2. Ok... so, when requested with a URL like ...functions/:function_uuid/instances/:instance_uuid, the GK API will pass the MonMan something like the above (..."filters": ["id='0987654321lkjhgfds'", "type='vnf'"]...), ok?
    • do you think we'll need to go deeper than that, like specifying the vdu we want the metrics from? Like in ...functions/:function_uuid/instances/:instance_uuid/vdus/:vdu_uuid?
  3. Ok, perfect
pkarkazis commented 7 years ago

Hello @jbonnet

  1. Now we can create alerts/rules related to spesific metrics of each function. if we want to capture an event on service layer we have two options. Either, we can based on alerts from several functions which consistiing a service or we can insert the definition of service alert inside the service descriptor (this is not implemented yet). In any case I dont think that we need too much effort to support this. Of course it will be easier to deside the approach if we know which is the "service layer metric" we want to provide.

  2. Yes, if you want to get a metric based on instance_uuid you must pass it in filters/labels array: for synch request (websocket): curl -s -H "Accept: application/json" -H "Content-Type:application/json" -X POST --data '{"metric":"vm_cpuperc","filters":["id"=_]}' "http:///api/v1/ws/new" for asynch request tw_end=$(date -u '+%Y-%m-%dT%H:%M:%SZ') tw_start=$(date -u -d -10minutes '+%Y-%m-%dT%H:%M:%SZ') curl -s -H "Accept: application/json" -H "Content-Type:application/json" -X POST --data
    '{"name":"vm_mem_perc","start": "'$tw_start'", "end": "'$twend'", "step": "10s", "labels":
    [{**
    "id":_**]}' "http:///api/v1/prometheus/metrics/data" I dont think that we need something more than function_uuid or instance_uuid.

jbonnet commented 7 years ago

Closed by PR #653, though still needs some integration testing, and is done only for the *synchronous case (for the asynchronous case see #654).

stevenvanrossem commented 7 years ago

@jbonnet lets assume this simple use case: a developer wants to monitor a VNF that is deployed as part of a known service in the SP. For a developer using the SDK, we assume only the nsd is known. following steps then need to be executed from the SDK: 1) via ...api/v2/services the service descriptor uuid can be looked up based on the service name/nsd 2) via …api/v2/records/services -> returns an empty list, can the sonata-demo-1 service be instantiated to check? 4) via ...api/v2/functions the VNF descriptor uuid must be queried (function_uuid) 5) via ...api/v2/records/functions the VNF instance uuid must be queried (instance_uuid) -> this returns an empty list, can a service be instantiated to check?

The SDK needs to know VNF function_uuid and VNF instance_uuid , and also be sure that this VNF instance is part of a specific service instance. How can a VNF looked up in step 3,4 link to a service returned in step 1,2? In other words, how to link a VNF instance to a service instance? is it part of the record? Are VNF instance uuids part of the service record? if the instance_uuid is unique, then no need for function_uuid in the api, just use: /instances/:instance_uuid/synch-mon-data?

stevenvanrossem commented 7 years ago

After checking all the api's described, above, I have one concern (correct me if this use-case is not relevant…) there might be a problem in this specific situation: if the same VNF is deployed multiple times inside 1 service or in multiple different services (deploying the same vnf_name/vnfd multiple times, but with a different and unique vnf_id in the nsd), the service record will return multiple vnfr_ids like this: "network_functions": [ { "vnfr_id": "xxxxx" }, { "vnfr_id": "yyyyy" } ], Those can be multiple VNF instances of the same image, and the same vnf_name, but a different vnf_id. When looking up those vnfr_ids with the api/v2/records/functions, or api/v2//functions, the vnf_id is not returned, so this information is lost and a developer cannot know which vnfr_id refers to which vnf_id specifically? in other words, how to link any unique vnf_id in the nsd to its vnf instance uuid? eg. A developer uses the SDK to create and deploy an nsd, then uses the service record and the vnf_id from the nsd to find back this specific VNF instance in the SP. I think it is needed to include the vnf_id from the nsd also in the service record. Is this possible?

jbonnet commented 7 years ago

@stevenvanrossem To be confirmed (@tsoenen?), but the record already contains a function_uuid which I think is what you're looking for (vnf_id).

stevenvanrossem commented 7 years ago

It seems function_uuid refers to the vnf_name and not the vnf_id? What happens when you deploy multiple instances of the same VNF, do they get the same function_uuid?

jbonnet commented 7 years ago

Exactly, but distinct (instance) UUIDs.

jbonnet commented 7 years ago

Ok, when you @stevenvanrossem refer to vnf_id you mean the function id within the service? Yes, in that case, it can appear more than once...

stevenvanrossem commented 7 years ago

With vnf_id I refer to the NSD, Example: if I define a service with multiple firewalls, (each firewall might be at some other position in the forwarding path). They use the same vnf_name, but have a different vnf_id in the NSD:

network_functions:
  - vnf_id: "vnf_firewall_1"
    vnf_vendor: "eu.sonata-nfv"
    vnf_name: "firewall-vnf"
    vnf_version: "0.3"
  - vnf_id: "vnf_firewall_2"
    vnf_vendor: "eu.sonata-nfv"
    vnf_name: "firewall-vnf"
    vnf_version: "0.3"

Is this part of any use-case? In this case the instance_uuid of the VNFs cannot be linked back to either vnf_firewall_1 or vnf_firewall_2 It could be solved if the vnf_id is also mentioned in the service record. currently, the service record mentions only this:

"network_functions": [
{
"vnfr_id": "xxxxx"
},
{
"vnfr_id": "yyyyy"
}
],

can vnf_id be added?

jbonnet commented 7 years ago

Ok, @stevenvanrossem, what a mess... Let me go step by step, for my own sanity:

  1. NSD is like what you mention above;
  2. when those functions are on-boarded into the Catalogue, each one gets its unique UUID (this is the function_id of this API, not the vnf_id);
  3. when a service is instantiated, each one of its function instances also get their unique UUID (this is the instance_id of this API);
  4. when you ask for the records, every instance of the same function will share the function_id but have its own (unique) instance_id (named in the record simply by uuid);

So... if we just use the instance_id (and not also the function_id), we can get metrics; can you, on the SDK side, 'navigate' to this instance_id and get it?

stevenvanrossem commented 7 years ago

yes, the SDK can navigate to the instance_id and get metrics of each instance. I also understand the difference between instance_id and function_id. Let me put it different and sketch this use-case: 1) in the SDK, a developer creates an NSD and deploys it on the SP 2) the developer wants to retrieve monitoring data for a specific VNF in this NSD. In the SDK, the developer wants to monitor a VNF, identified by the vnf_id in the NSD. 3) the SDK looks up the service instance and checks in the service record which are the VNF instance_ids in this service instance. 4) for each instance_id, the function_id is looked up 5) knowing both instance_id and function_id the SDK can now use the GK api to retrieve monitoring data from the SP. But now, the tricky part is to know which instance_id belongs to this specific VNF the developer wants to monitor. In the case of above example, with the same VNF deployed multiple times in a service, there is currently no way to tell. The SDK will get multiple instance_ids but cannot tell which one belongs to vnf_firewall_1 and which one to vnf_firewall_2? I think the easiest solution is to include the vnf_id from the NSD also in the service record, is this possible?

jbonnet commented 7 years ago

Ok, @stevenvanrossem , now I got it... @tsoenen , is it easy for the FLM to include the vnf_id in the record? @mbredel this would affect the records schema... Is there any alternative design? Can we assume this duplicated VNF will never happen?

mbredel commented 7 years ago

I think the records schema are more like Josep's code from the Pirates of the Caribbean - no one is following them closely. Or are the records schemata used for verification somewhere? ... However, if needed I can change them.

tsoenen commented 7 years ago

@stevenvanrossem @jbonnet : I haveb't followed the entire discussion, but where does the FLM find this vnf_id? Is it part of the VNFD?

stevenvanrossem commented 7 years ago

it is not part of the VNFD, only in the NSD

tsoenen commented 7 years ago

This gives an architectural conflict I would say, as the FLM functions on the function level, and doesn't know the NSD by design.

stevenvanrossem commented 7 years ago

And who makes the service record? the SLM? can the SLM include the vnf_id in the service record?

tsoenen commented 7 years ago

That would be a possibility, but both the current NSR and NSD are at your disposal, isn't it possible to do the mapping without making the change?

stevenvanrossem commented 7 years ago

In some specific cases not, see some comments above: https://github.com/sonata-nfv/son-gkeeper/issues/504#issuecomment-307128490

tsoenen commented 7 years ago

Aight, I get it:D

@felipevicens @mbredel : we should extend the schema for the nsr after line 90 (https://github.com/sonata-nfv/son-schema/blob/master/service-record/nsr-schema.yml#L90), by adding the 'vnf_id' property, which should be a string.

jbonnet commented 7 years ago

One possibility would be to simplify and consider that the double (triple, n-uple) VNF in the same NS is out of our scope... how serious would this be? @tsoenen As @mbredel noted above, records are probably not being validated against the schema (btw, are you doing it? ;-)), so you can add that field...

tsoenen commented 7 years ago

Actually, since network_functions in the nsr is a list, I think it has the same order as the list in the nsd. This would solve the mapping problem, right?

tsoenen commented 7 years ago

@jbonnet : the repositories do validate against the schema

jbonnet commented 7 years ago

@tsoenen Ah, great! Sorry, no pun intended...

stevenvanrossem commented 7 years ago

@tsoenen only if you are 100% sure that the list order is the same? :smile:

stevenvanrossem commented 7 years ago

The GK api does not return any info on the created websocket. …/functions/:function_uuid/instances/:instance_uuid/synch-mon-data?metric=cpu_util&for=<number of seconds> return: {'function_instance_uuid': 'c3a85707-9ae4-4325-9e36-12065a763617', 'function_uuid': '152d27dc-4f61-42c0-ac72-2bd28eb464be', 'metrics': ['vm_cpu_perc']}

Should this function also return the websocket url (as returned by the Monitoring Manager)?

jbonnet commented 7 years ago

You're right, @stevenvanrossem , it's hasn't been implemented yet (see #667). Let me do a PR and I'll jump into that one next.

stevenvanrossem commented 7 years ago

@jbonnet seems this issue is not closed yet :-)

The (a)synch monitor request to the GK api expects the SONATA function instance id. But the MonMan exports the metrics only with the label of the Openstack uuid. (a)synch monitor request via GK needs to translate SONATA instance uuid to the correct uuid used by the Monitor Manager for a deployed vnf: http://sp.int3.sonata-nfv.eu:8000/api/v1/functions http://sp.int3.sonata-nfv.eu:8000/api/v1/functions/service// can be checked to retrieve the correct mapping from the SONATA instance uuid to the MonMan instance id.

All requests from the SDK should go via the GK, (I don't think it is authorized that the SDK contacts the MonMan directly) Therefore I think the GK api and MonMan api need some more integration to process the (a)sync monitoring request correctly.

Also after discussion with @DarioValocchi and @pkarkazis, a single vnf_instance can have multiple vdu_instances (eg. at scale-out). This means that to uniquely identify a metric, vnf_instance_id AND vdu_instance_id are needed...

jbonnet commented 7 years ago

Hi, @stevenvanrossem No problem :-) The second link gives me a 404. So, should we add the vdu_instance_idto the endpoint URL?

jbonnet commented 7 years ago

@stevenvanrossem, @pkarkazis, @DarioValocchi As agreed in Gitter: …/functions/metrics/:inst_id/:vdu_id/:vnfc_id (we keep functions for the sake of having a services in the future, and we change instances into metrics). For the moment :vdu_id will not be used to request data from the Monitoring Manager.