Closed ashish235 closed 6 years ago
Your configuration looks correct, but the log of the FMS contains to many information regarding Spring debug and it is hard to see the problem. For example I see the zabbix-pugin sending an alarm with state "FIRED" but there are no logs in the FMS about it. Please try to make the FMS log more evident by changing the log properties as below:
######################################### ########## Logging properties ########### ######################################### logging.level.org.springframework=INFO logging.level.org.hibernate=INFO logging.level.org.openbaton=DEBUG logging.file=/var/log/openbaton/openbaton-fm.log
Then you can restart the FMS and try it again.
@marcellom , thanks for the reply. I 've changed the logging as suggested. But a strange thing is happening now. After changing the logging when I restarted the FMS, the healing worked and when I tried to test it for the 2nd time, the healing didn't work. Nothing changed here in between the tests.
Logs from 1st test: https://gist.github.com/ashish235/143f9b6dcd6d49e7747f60f0373f6f0a
Logs from 2nd test: https://gist.github.com/ashish235/d7581d6679139c8bda5d48a408b86814
I don't see the heal fired in the 2nd try, please suggest, what's happening here. I 'll restart the FMS and try to reproduce the issue.
@marcellom, I did repeat the tests and the results were identical. After a heal action was performed, if I take down the VNF and Launch it again, healing doesn't work. After I restart the FMS heal works for the 1st time.
A 2nd issue I discovered - A healing action was successfully undertaken and the VNF is healed. Now alarm from zabbix is cleared. Now if I again kill the service on VNF, zabbix raised an alarm by FMS didn't undertake any action. Is FMS designed to heal a VNF only once in it's lifecycle? Maybe both the issues are related, maybe not.
Logs for the 2nd issue I mentioned:
2017-07-24 12:58:20.336 INFO 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : Heal fired! 2017-07-24 12:58:20.336 DEBUG 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : Recovery action in progress! :RecoveryAction{recoveryActionType=HEAL, vnfmEndpoint='generic', vimName='', startingTime='2017/07/24 12:58:20', status=IN_PROGRESS, vnfrId='null', nsrId='c8813d2c-21bf-46ea-8225-60dbe168a5d9'} 2017-07-24 12:58:23.130 DEBUG 8202 --- [SimpleAsyncTaskExecutor-1] o.o.f.receivers.OpenbatonEventReceiver : Received nfvo event with action: INSTANTIATE_FINISH 2017-07-24 12:58:50.056 DEBUG 8202 --- [http-nio-9000-exec-2] o.o.f.receivers.RestEventReceiver : Received VR state changed alarm 2017-07-24 12:58:50.121 DEBUG 8202 --- [Thread-3] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records 2017-07-24 12:58:50.146 DEBUG 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : vnfrId : 03652436-90ec-46fa-948d-5c54e71a635d policyId:e3eb0379-c335-4881-bbef-41c96387bd84 alarmState:CLEARED 2017-07-24 12:58:50.160 INFO 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : This VNF alarm is cleared :VNFAlarm{vnfrId='03652436-90ec-46fa-948d-5c54e71a635d', vnfcIds=[137506e5-28d8-4b31-a8ac-c85af8d7d7ec], vimName='vim name'} Alarm{alarmId='dcd8e24b-ca56-43b7-9181-1c0536b993bf', thresholdId='14220', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 12:57:36', alarmState=CLEARED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 12:57:49', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'} 2017-07-24 12:58:50.177 DEBUG 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : Current VNF alarms:[VNFAlarm{vnfrId='7025e6a6-980f-4d2e-aa10-a4c3afc73802', vnfcIds=[9032348f-b329-473d-998f-d71497111689], vimName='vim name'} Alarm{alarmId='1f8353c6-13c3-4e73-8b36-f0d3d01e2de2', thresholdId='14159', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 11:28:29', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 11:28:48', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='c34e3970-5ceb-4607-9bab-ec7c6e6d527b', vnfcIds=[959f7137-0713-422c-a833-fb113d52f265], vimName='vim name'} Alarm{alarmId='2d47f393-f74b-4ae8-896b-16b12806a42f', thresholdId='14119', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.21 14:41:15', alarmState=UPDATED, perceivedSeverity=CRITICAL, eventTime='2017/07/21 14:41:37', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='3e0bcdd7-6fc5-44c2-b0de-e61ace3b7308', vnfcIds=[b91baf68-c9c4-4dad-957c-3ed5315090c8], vimName='vim name'} Alarm{alarmId='4550a6da-e2c0-4786-a38c-8e66ec51f3d3', thresholdId='14180', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 12:02:43', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 12:02:48', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='deb2febf-aacd-4daf-b806-4728c19208a5', vnfcIds=[0cc4a702-5eb8-4880-9777-590db6a3cafc], vimName='vim name'} Alarm{alarmId='45dc5976-f7b5-403d-9dd0-2f85923bfed8', thresholdId='13997', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.20 17:24:30', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/07/20 17:24:54', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='c25dbad9-f389-4c06-8ee0-002e43ba750c', vnfcIds=[190453d2-a704-47f1-81b3-3d3112823471], vimName='vim name'} Alarm{alarmId='577e1f35-8429-4205-ac4a-19478ab73b15', thresholdId='14140', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 11:04:57', alarmState=CLEARED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 11:05:17', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='eb39d84f-aaf8-4262-826b-6fda3883abba', vnfcIds=[3250877a-e656-4cb0-8a83-ec0ff3aa5d7e], vimName='vim name'} Alarm{alarmId='61a6fcc4-b65c-44bd-b71a-9371e45e4992', thresholdId='14201', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 12:42:02', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 12:42:19', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='0c2b9986-37b8-49ba-ad8a-61b58b2dba59', vnfcIds=[99e00a13-78e6-4f0c-b957-ec92fd658f66], vimName='vim name'} Alarm{alarmId='8ba3a090-b855-4dab-b394-3959b765ce83', thresholdId='13940', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.20 16:46:30', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/07/20 16:46:53', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='e6ed9e6d-fd3d-4906-93a6-7f8755e94255', vnfcIds=[0529e90a-c6d6-48f5-a43d-2390d8af2845], vimName='vim name'} Alarm{alarmId='d5ea38e5-274f-45ad-a158-c0b981376de2', thresholdId='13901', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.06.15 13:29:00', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/06/15 13:29:12', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='03652436-90ec-46fa-948d-5c54e71a635d', vnfcIds=[137506e5-28d8-4b31-a8ac-c85af8d7d7ec], vimName='vim name'} Alarm{alarmId='dcd8e24b-ca56-43b7-9181-1c0536b993bf', thresholdId='14220', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 12:57:36', alarmState=CLEARED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 12:57:49', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}, VNFAlarm{vnfrId='369eab8b-fd18-4525-8558-c9aeaabc902e', vnfcIds=[2408f2a2-21de-4815-9be9-11722229b3d3], vimName='vim name'} Alarm{alarmId='dd622f29-4bbb-475e-97fe-ba7243bd25cc', thresholdId='14098', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.21 13:07:30', alarmState=FIRED, perceivedSeverity=CRITICAL, eventTime='2017/07/21 13:07:36', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'}] 2017-07-24 12:59:09.616 DEBUG 8202 --- [pool-2-thread-1] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records/c8813d2c-21bf-46ea-8225-60dbe168a5d9 2017-07-24 12:59:14.615 DEBUG 8202 --- [pool-4-thread-1] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records/c8813d2c-21bf-46ea-8225-60dbe168a5d9 2017-07-24 13:00:09.616 DEBUG 8202 --- [pool-2-thread-1] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records/c8813d2c-21bf-46ea-8225-60dbe168a5d9 2017-07-24 13:00:14.615 DEBUG 8202 --- [pool-4-thread-1] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records/c8813d2c-21bf-46ea-8225-60dbe168a5d9 2017-07-24 13:00:20.115 DEBUG 8202 --- [http-nio-9000-exec-3] o.o.f.receivers.RestEventReceiver : Received VR state changed alarm 2017-07-24 13:00:20.146 DEBUG 8202 --- [Thread-3] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records 2017-07-24 13:00:20.172 DEBUG 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : vnfrId : 03652436-90ec-46fa-948d-5c54e71a635d policyId:e3eb0379-c335-4881-bbef-41c96387bd84 alarmState:UPDATED 2017-07-24 13:00:20.187 INFO 8202 --- [Thread-3] .f.c.f.KieSessionGlobalConfigurationImpl : This VNF alarm is updated :VNFAlarm{vnfrId='03652436-90ec-46fa-948d-5c54e71a635d', vnfcIds=[137506e5-28d8-4b31-a8ac-c85af8d7d7ec], vimName='vim name'} Alarm{alarmId='dcd8e24b-ca56-43b7-9181-1c0536b993bf', thresholdId='14220', alarmType=VIRTUAL_NETWORK_FUNCTION, alarmRaisedTime='2017.07.24 12:57:36', alarmState=UPDATED, perceivedSeverity=CRITICAL, eventTime='2017/07/24 12:57:49', faultType=null, probableCause='null', isRootCause=false, correlatedAlarmId=[], faultDetails='null'} 2017-07-24 13:00:20.187 DEBUG 8202 --- [Thread-3] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records 2017-07-24 13:00:20.212 DEBUG 8202 --- [Thread-3] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records 2017-07-24 13:00:20.231 DEBUG 8202 --- [Thread-3] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records
2017-07-24 13:01:09.616 DEBUG 8202 --- [pool-2-thread-1] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records/c8813d2c-21bf-46ea-8225-60dbe168a5d9 2017-07-24 13:01:14.615 DEBUG 8202 --- [pool-4-thread-1] o.o.s.a.r.NetworkServiceRecordRestAgent : Executing get on: http://172.16.111.94:8080/api/v1/ns-records/c8813d2c-21bf-46ea-8225-60dbe168a5d9
I think the problem is that the FMS does not clear the state of the first recovery action. This is done through Drools rules. You can create an issue in the FMS project and I will have a look into it. BTW what version of FMS are you using?
@marcellom, sure. The issue on FMS project is here: https://github.com/openbaton/fm-system/issues/7
I couldn't find the version of FMS anywhere. I installed it using the bootstrap method. Have used the latest Openbaton version, think it's 3.2.2. The jar for FMS is openbaton-fms-1.2.4.jar.
BTW any clue about the 1st issue? Where I launched a different VNF still healing action was not performed and it works after FMS restart.
Where did you get the information that led you to configure the resiliencyLevel to
"high_availability":{
"resiliencyLevel":"ACTIVE_STANDBY_STATELESS",
"redundancyScheme":"1:N"
}
and it seemed to be reliably instantiating new instances up to my scale out limit. But I am curious as to what the "HEAL" does.
@wittling the information of "resiliencyLevel":"HEAL" is not in the documentation because is a workaround for using the healing method without HA (standby component deployed). In the next FMS release the requirement of having HA policy will be removed in case you want to use only the HEAL method. I guess I can close this issue since the initial issue has been solved. Please create an issue in the FMS project eventually.
Hi Team,
I 've a VNF whose monitoring policy is like below.
Now when I kill the service, I can zabbix invoking the notification script for the healing to take place.
openbaton-plugin-monitoring-zabbix.log
But after that, nothing happens on FMS side. See the attached logs.
Openbaton-fm.txt
Please suggest, if something I need to change/add to my configuration.