jenkinsci / openstack-cloud-plugin

Provision nodes from OpenStack on demand
https://plugins.jenkins.io/openstack-cloud
MIT License
47 stars 83 forks source link

Prevent leaking allocated Floating IP #81

Closed olivergondza closed 8 years ago

olivergondza commented 8 years ago

It happens that FIPs used by plugin get never deallocated after server is gone. Please turn the fine logging on and grep it for dangling FIPs after a while so we know how it can happen.

binhex commented 8 years ago

ok so ive turned on fine logging by following the instructions, and got the following from the log-

Provisioning new openstack node openstack-raleigh-slave-449 with options jenkins.plugins.openstack.compute.SlaveOptions@7d14444d[imageId=ubuntu_jre_pae,hardwareId=4,networkId=68f25556-3a15-4f6c-9cae-cd2bc545d265,userDataId=<null>,instanceCap=1,floatingIpPool=admin_floating_net,securityGroups=default,availabilityZone=<null>,startTimeout=600000,keyPairName=jenkins_master_pae,numExecutors=1,jvmOptions=<null>,fsRoot=/home/ibmcloud,credentialsId=3b98a93e-245b-42c0-8339-fe7752573d14,slaveType=SSH,retentionTime=0]
Jun 01, 2016 3:26:18 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning openstack-raleigh-slave from openstack-raleigh with 1 executors. Remaining excess workload: 0
Jun 01, 2016 3:26:31 PM INFO jenkins.plugins.openstack.compute.JCloudsSlaveTemplate provision
Provisioned: NovaServer{id=907acd7b-b7f4-4aa4-bd1b-71a136af155f, name=openstack-raleigh-slave-449, image={id=4ba47fdb-0782-4de4-896b-f2c072e7a99e, links=[{href=http://9.42.5.6:8774/89b39bebfa104b33913df9aadd22ec47/images/4ba47fdb-0782-4de4-896b-f2c072e7a99e, rel=bookmark}]}, flavor=NovaFlavor{id=4, ephemeral=0, swap=0, rxtx_factor=1.0, is_public=true, links=[GenericLink{href=http://9.42.5.6:8774/89b39bebfa104b33913df9aadd22ec47/flavors/4, rel=bookmark}], 
}, status=ACTIVE, diskconfig=MANUAL, userId=fe43afd688324a7788cb616118f37605, created=Wed Jun 01 15:26:22 BST 2016, updated=Wed Jun 01 15:26:30 BST 2016, launched at=Wed Jun 01 15:26:30 BST 2016, tenantId=89b39bebfa104b33913df9aadd22ec47, hostId=6fca29d3a15209e830b653b36fe0f817dcfbb0399318569831258fb9, addresses=NovaAddresses{addresses={Pi2-Apollo_net1=[NovaAddress{address=10.10.15.99, type=fixed, version=4, macaddr=fa:16:3e:d7:38:dd, 
}]}, 
}, powerstate=1, vmState=active, metadata={jenkins-instance=http://192.168.187.176:8080/, jenkins-template-name=openstack-raleigh-slave}}
Jun 01, 2016 3:26:34 PM INFO jenkins.plugins.openstack.compute.JCloudsSlave createComputer
Creating a new computer for openstack-raleigh-slave-449
Jun 01, 2016 3:27:36 PM INFO hudson.slaves.NodeProvisioner$2 run
openstack-raleigh-slave provisioning successfully completed. We have now 2 computer(s)
Jun 01, 2016 3:27:57 PM INFO hudson.model.Run execute
test_job #5 main build action completed: SUCCESS
Jun 01, 2016 3:28:59 PM INFO jenkins.plugins.openstack.compute.JCloudsRetentionStrategy doCheck
Scheduling openstack-raleigh-slave-449 for termination as it was idle since Wed Jun 01 15:27:57 BST 2016
Jun 01, 2016 3:28:59 PM INFO jenkins.plugins.openstack.compute.JCloudsComputer setPendingDelete
Setting openstack-raleigh-slave-449 pending delete status to true
Jun 01, 2016 3:29:39 PM INFO hudson.model.AsyncPeriodicWork$1 run
Started OpenStack slave cleanup
Jun 01, 2016 3:29:39 PM INFO jenkins.plugins.openstack.compute.JCloudsCleanupThread$1 run
Deleting pending node openstack-raleigh-slave-449. Reason: Slave is pending removal.
Jun 01, 2016 3:29:39 PM INFO jenkins.plugins.openstack.compute.JCloudsComputer deleteSlave
Deleting slave openstack-raleigh-slave-449
Jun 01, 2016 3:29:45 PM INFO jenkins.plugins.openstack.compute.JCloudsComputer deleteSlave
Deleted slave openstack-raleigh-slave-449
Jun 01, 2016 3:29:45 PM INFO hudson.model.AsyncPeriodicWork$1 run
Finished OpenStack slave cleanup. 5,861 ms
Feed    All    > SEVERE    > WARNING

so this then resulted in the instance being terminated, and the allocated floating ip being in a disassociated state, screenshot:-

screen shot 2016-06-01 at 15 33 41

olivergondza commented 8 years ago

Unfortunately, the log does not contain enough details to understand why was the IP leaked. It seem you are logging only INFO and above, make sure FINE is selected in your logger.

binhex commented 8 years ago

Sorry @olivergondza i created a new log but didnt attach it correctly to the logger, i have now done so and set it to "fine" logging detail, i hope this gives you more of a clue:-

Jun 02, 2016 11:46:58 AM INFO hudson.model.AsyncPeriodicWork$1 run
Started OpenStack slave cleanup
Jun 02, 2016 11:46:58 AM INFO hudson.model.AsyncPeriodicWork$1 run
Finished OpenStack slave cleanup. 0 ms
Jun 02, 2016 11:47:50 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Openstack client created for http://9.42.5.6:5000/v2.0
Jun 02, 2016 11:47:51 AM FINE jenkins.plugins.openstack.compute.JCloudsCloud
Provisioning slave for test from template openstack-raleigh-slave
Jun 02, 2016 11:47:51 AM INFO jenkins.plugins.openstack.compute.JCloudsSlaveTemplate provision
Provisioning new openstack node openstack-raleigh-slave-7515 with options jenkins.plugins.openstack.compute.SlaveOptions@25e1bf1f[imageId=ubuntu_jre_pae,hardwareId=4,networkId=68f25556-3a15-4f6c-9cae-cd2bc545d265,userDataId=<null>,instanceCap=1,floatingIpPool=admin_floating_net,securityGroups=default,availabilityZone=<null>,startTimeout=600000,keyPairName=jenkins_master_pae,numExecutors=1,jvmOptions=<null>,fsRoot=/home/ibmcloud,credentialsId=3b98a93e-245b-42c0-8339-fe7752573d14,slaveType=SSH,retentionTime=0]
Jun 02, 2016 11:47:51 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Openstack client created for http://9.42.5.6:5000/v2.0
Jun 02, 2016 11:47:52 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Setting image id to 4ba47fdb-0782-4de4-896b-f2c072e7a99e
Jun 02, 2016 11:47:52 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Setting hardware Id to 4
Jun 02, 2016 11:47:52 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Setting network to 68f25556-3a15-4f6c-9cae-cd2bc545d265
Jun 02, 2016 11:47:52 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Setting security groups to default
Jun 02, 2016 11:47:52 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Setting keyPairName to jenkins_master_pae
Jun 02, 2016 11:47:53 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Openstack client created for http://9.42.5.6:5000/v2.0
Jun 02, 2016 11:47:53 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Booting machine
Jun 02, 2016 11:51:58 AM INFO hudson.model.AsyncPeriodicWork$1 run
Started OpenStack slave cleanup
Jun 02, 2016 11:51:58 AM INFO hudson.model.AsyncPeriodicWork$1 run
Finished OpenStack slave cleanup. 1 ms
Jun 02, 2016 11:52:37 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Machine started: openstack-raleigh-slave-7515
Jun 02, 2016 11:52:37 AM INFO jenkins.plugins.openstack.compute.JCloudsSlaveTemplate provision
Provisioned: NovaServer{id=792f1311-70f3-4cbc-9314-131071528c92, name=openstack-raleigh-slave-7515, image={id=4ba47fdb-0782-4de4-896b-f2c072e7a99e, links=[{href=http://9.42.5.6:8774/89b39bebfa104b33913df9aadd22ec47/images/4ba47fdb-0782-4de4-896b-f2c072e7a99e, rel=bookmark}]}, flavor=NovaFlavor{id=4, ephemeral=0, swap=0, rxtx_factor=1.0, is_public=true, links=[GenericLink{href=http://9.42.5.6:8774/89b39bebfa104b33913df9aadd22ec47/flavors/4, rel=bookmark}], 
}, status=ACTIVE, diskconfig=MANUAL, userId=fe43afd688324a7788cb616118f37605, created=Thu Jun 02 11:47:55 BST 2016, updated=Thu Jun 02 11:52:36 BST 2016, launched at=Thu Jun 02 11:52:36 BST 2016, tenantId=89b39bebfa104b33913df9aadd22ec47, hostId=f127df203c7060c91169e749ebdbf5d5dfb119ab55ab862f1c895f95, addresses=NovaAddresses{addresses={Pi2-Apollo_net1=[NovaAddress{address=10.10.15.101, type=fixed, version=4, macaddr=fa:16:3e:e6:bf:d8, 
}]}, 
}, powerstate=1, vmState=active, metadata={jenkins-instance=http://192.168.187.176:8080/, jenkins-template-name=openstack-raleigh-slave}}
Jun 02, 2016 11:52:37 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Assigning floating IP from admin_floating_net to openstack-raleigh-slave-7515
Jun 02, 2016 11:52:37 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Allocating floating IP for openstack-raleigh-slave-7515
Jun 02, 2016 11:52:38 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Floating IP allocated 9.42.7.160
Jun 02, 2016 11:52:38 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Assigning floating IP to openstack-raleigh-slave-7515
Jun 02, 2016 11:52:40 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Floating IP assigned
Jun 02, 2016 11:52:40 AM INFO jenkins.plugins.openstack.compute.JCloudsSlave createComputer
Creating a new computer for openstack-raleigh-slave-7515
Jun 02, 2016 11:52:40 AM FINE jenkins.plugins.openstack.compute.JCloudsCloud
Waiting for slave openstack-raleigh-slave-7515 to launch
Jun 02, 2016 11:53:10 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 02, 2016 11:53:26 AM FINE jenkins.plugins.openstack.compute.JCloudsCloud
Slave openstack-raleigh-slave-7515 launched successfully
Jun 02, 2016 11:55:10 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 02, 2016 11:55:10 AM INFO jenkins.plugins.openstack.compute.JCloudsRetentionStrategy doCheck
Scheduling openstack-raleigh-slave-7515 for termination as it was idle since Thu Jun 02 11:53:52 BST 2016
Jun 02, 2016 11:55:10 AM INFO jenkins.plugins.openstack.compute.JCloudsComputer setPendingDelete
Setting openstack-raleigh-slave-7515 pending delete status to true
Jun 02, 2016 11:56:58 AM INFO hudson.model.AsyncPeriodicWork$1 run
Started OpenStack slave cleanup
Jun 02, 2016 11:56:58 AM INFO jenkins.plugins.openstack.compute.JCloudsCleanupThread$1 run
Deleting pending node openstack-raleigh-slave-7515. Reason: Slave is pending removal.
Jun 02, 2016 11:56:58 AM INFO jenkins.plugins.openstack.compute.JCloudsComputer deleteSlave
Deleting slave openstack-raleigh-slave-7515
Jun 02, 2016 11:56:59 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Openstack client created for http://9.42.5.6:5000/v2.0
Jun 02, 2016 11:56:59 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Destroying machine openstack-raleigh-slave-7515
Jun 02, 2016 11:57:04 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Machine destroyed: openstack-raleigh-slave-7515
Jun 02, 2016 11:57:05 AM INFO jenkins.plugins.openstack.compute.JCloudsComputer deleteSlave
Deleted slave openstack-raleigh-slave-7515
Jun 02, 2016 11:57:05 AM INFO hudson.model.AsyncPeriodicWork$1 run
Finished OpenStack slave cleanup. 6,313 ms
Jun 02, 2016 12:01:58 PM INFO hudson.model.AsyncPeriodicWork$1 run
olivergondza commented 8 years ago

Thanks @binhex, this looks like a new problem.

olivergondza commented 8 years ago

Turns out the code was all wrong as it collected FIPs by server after it was deleted (openstack disassociates all IPs when server is deleted). Thanks for reporting this.

binhex commented 8 years ago

Ahh perfect! Thanks for taking the time to fix it :-) so how can I get hold of the latest build to test the commit? Do you have an automated build system?

olivergondza commented 8 years ago

Grab the bits here: https://jenkins.ci.cloudbees.com/job/plugins/job/openstack-cloud-plugin/211/org.jenkins-ci.plugins$openstack-cloud/artifact/org.jenkins-ci.plugins/openstack-cloud/2.8-SNAPSHOT/openstack-cloud-2.8-SNAPSHOT.hpi, please let me know if it is sufficient. I will retest and release on Monday.

binhex commented 8 years ago

@olivergondza ive just tested it and it works a treat, awesome work, so i guess i should see this included in the 2.8 release right?.

thank you very much for the fix, its really appreciated.

olivergondza commented 8 years ago

2.8 should be in update center in couple of hours. Meanwhile it is maven repo: http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/openstack-cloud/2.8/openstack-cloud-2.8.hpi

binhex commented 8 years ago

perfect, thanks again.

winklerm commented 8 years ago

Hi, unfortunately, it seems the fix is not working for me. I am using OpenStack Cloud Plugin 2.8 and the Floating IP addresses do not get released. This is the relevant log part:

Jun 17, 2016 11:01:49 AM INFO jenkins.plugins.openstack.compute.JCloudsSlaveTemplate provision
Provisioned: NovaServer{id=3754db4f-2cd2-4211-9d08-81071d66f4ad, name=qeos-rhel7-418, image={id=401b5f07-b31c-4b85-913d-96e9dea83499, links=[{href=http://qeos.lab.eng.rdu2.redhat.com:8774/6015adffcb5f422c833758fdd4d2eb58/images/401b5f07-b31c-4b85-913d-96e9dea83499, rel=bookmark}]}, flavor=NovaFlavor{id=4, ephemeral=0, swap=0, rxtx_factor=1.0, is_public=true, links=[GenericLink{href=http://qeos.lab.eng.rdu2.redhat.com:8774/6015adffcb5f422c833758fdd4d2eb58/flavors/4, rel=bookmark}], 
}, status=ACTIVE, diskconfig=MANUAL, userId=e69f7740098a4b3fb8198a3a48df7014, created=Fri Jun 17 11:00:12 EDT 2016, updated=Fri Jun 17 11:01:52 EDT 2016, launched at=Fri Jun 17 11:01:52 EDT 2016, tenantId=6015adffcb5f422c833758fdd4d2eb58, hostId=e1817dd7b6bee66d9e525e1f7242b44eeea483dcb5f3b5be3b7fc3d8, addresses=NovaAddresses{addresses={apiman-qe-jenkins-1=[NovaAddress{address=172.16.125.7, type=fixed, version=4, macaddr=fa:16:3e:6b:28:b0, 
}], bxms-qe-jenkins-1=[NovaAddress{address=172.16.126.7, type=fixed, version=4, macaddr=fa:16:3e:95:ba:36, 
}]}, 
}, powerstate=1, vmState=active, metadata={jenkins-instance=https://bxms-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/, jenkins-template-name=qeos-rhel7}}
Jun 17, 2016 11:01:49 AM FINE jenkins.plugins.openstack.compute.JCloudsSlaveTemplate
Assigning floating IP from public to qeos-rhel7-418
Jun 17, 2016 11:01:49 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Allocating floating IP for qeos-rhel7-418
Jun 17, 2016 11:01:49 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Floating IP allocated 10.8.55.70
Jun 17, 2016 11:01:49 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Assigning floating IP to qeos-rhel7-418
Jun 17, 2016 11:01:52 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Floating IP assigned
Jun 17, 2016 11:01:53 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 17, 2016 11:01:53 AM INFO jenkins.plugins.openstack.compute.JCloudsSlave createComputer
Creating a new computer for qeos-rhel7-418
Jun 17, 2016 11:02:20 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 17, 2016 11:02:33 AM INFO jenkins.plugins.openstack.compute.JCloudsComputer setPendingDelete
Setting qeos-rhel7-418 pending delete status to true
Jun 17, 2016 11:03:20 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 17, 2016 11:04:20 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 17, 2016 11:05:20 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 17, 2016 11:05:35 AM INFO hudson.model.AsyncPeriodicWork$1 run
Started OpenStack slave cleanup
Jun 17, 2016 11:05:35 AM INFO jenkins.plugins.openstack.compute.JCloudsCleanupThread$1 run
Deleting pending node qeos-rhel7-418. Reason: Slave is pending removal.
Jun 17, 2016 11:05:35 AM INFO jenkins.plugins.openstack.compute.JCloudsComputer deleteSlave
Deleting slave qeos-rhel7-418
Jun 17, 2016 11:05:35 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Openstack client created for http://qeos.lab.eng.rdu2.redhat.com:5000/v2.0
Jun 17, 2016 11:05:35 AM FINE jenkins.plugins.openstack.compute.internal.Openstack
Destroying machine qeos-rhel7-418
Jun 17, 2016 11:05:44 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking
Jun 17, 2016 11:05:44 AM INFO hudson.model.AsyncPeriodicWork$1 run
Finished OpenStack slave cleanup. 9,499 ms
Jun 17, 2016 11:07:20 AM FINE jenkins.plugins.openstack.compute.JCloudsRetentionStrategy
Checking

I find strange that I do not see the log message [1] in the log, just [2]. Logger level is set to FINE.

[1] https://github.com/jenkinsci/openstack-cloud-plugin/blob/openstack-cloud-2.8/src/main/java/jenkins/plugins/openstack/compute/internal/Openstack.java#L295 [2] https://github.com/jenkinsci/openstack-cloud-plugin/blob/openstack-cloud-2.8/src/main/java/jenkins/plugins/openstack/compute/internal/Openstack.java#L259

olivergondza commented 8 years ago

It still might not work in case all retries are not enough to successfully delete the machine. Our inability to investigate is caused by Futures.successfulAsList[1].

https://github.com/jenkinsci/openstack-cloud-plugin/blob/master/src/main/java/jenkins/plugins/openstack/compute/JCloudsCleanupThread.java#L64

Holmistr commented 7 years ago

Hi @olivergondza , we're still experiencing floating IPs not released, we're using version 2.9. I don't know how to provide you more information, you can definitely direct me how to do that.

olivergondza commented 7 years ago

This is still an outstanding problem, unfortunately. https://github.com/jenkinsci/openstack-cloud-plugin/issues/84