Closed BastienClement closed 3 months ago
Hi @BastienClement, thanks for reporting this (and for the complete explanation, it's really helpful). I'm sorry that you have encountered this issue. I see the problem. I thought that unique.platform.aws.hostname
would always have the name of the instance, but every Openstack installation is different I guess. I'll made necessary changes to fix this in the next few days.
I'll add a name_attribute
that defaults to unique.platform.aws.hostname
to maintain compatibility but that can be provided in configuration with the value you need. In any case, if id_attribute
is set that will take priority over name_attribute
(also to maintain compatibility).
You could work around this by adding the id in the property as a meta (we are doing that in our nomad cluster)
client {
enabled = true
.....
meta {
instance_id = "XXXXXXXXXX"
}
}
We are getting that value from the instance metadata address ("http://169.254.169.254/openstack/latest/meta_data.json") in the field UUID. We get that value in the instance in a process after cloud-init and providing that to the nomad configuration. The here you could provide the id_attribute = meta.instance_id
and that should work.
I'll update this issue once a fix is released for this. Thanks for trying this plugin!
Thanks for the very quick reply. I've actually came to the same idea later yesterday.
I've deployed a custom build of the plugin with the following changes, and it works like a charm (after realizing that documentation about autoscaler and ACLs is very lackluster and you obviously require node = write
to scalin in 😛).
diff --git a/plugin/openstack.go b/plugin/openstack.go
index 53c4fb5..53bcc7e 100644
--- a/plugin/openstack.go
+++ b/plugin/openstack.go
@@ -742,8 +742,12 @@ func (t *TargetPlugin) getInstancePortID(id string) (string, error) {
// osNovaNodeIDMapBuilder is used to identify the Opensack Nova ID of a Nomad node using
// the relevant attribute value.
-func osNovaNodeIDMapBuilder(property string) scaleutils.ClusterNodeIDLookupFunc {
+func osNovaNodeIDMapBuilder(config map[string]string) scaleutils.ClusterNodeIDLookupFunc {
var isMeta bool
+ property := config[configKeyNodeIDAttr]
+ if property == "" {
+ property = config[configKeyNodeNameAttr]
+ }
if property == "" {
property = "unique.platform.aws.hostname"
}
diff --git a/plugin/plugin.go b/plugin/plugin.go
index 55c8f88..c2c4b7b 100644
--- a/plugin/plugin.go
+++ b/plugin/plugin.go
@@ -29,7 +29,8 @@ const (
configKeyCACertFile = "cacert_file"
configKeyInsecure = "insecure_skip_verify"
- configKeyNodeIDAttr = "id_attribute"
+ configKeyNodeIDAttr = "id_attribute"
+ configKeyNodeNameAttr = "name_attribute"
configKeyName = "name"
configKeyNamePrefix = "name_prefix"
@@ -120,7 +121,7 @@ func (t *TargetPlugin) SetConfig(config map[string]string) error {
// Store and set the remote ID callback function.
t.clusterUtils = clusterUtils
- t.clusterUtils.ClusterNodeIDLookupFunc = osNovaNodeIDMapBuilder(config[configKeyNodeIDAttr])
+ t.clusterUtils.ClusterNodeIDLookupFunc = osNovaNodeIDMapBuilder(config)
t.idMapper = config[configKeyNodeIDAttr] != ""
return nil
I'll try building and deploying from your branch instead, but I expect similar results since the code is so similar. Stay tuned.
Hi! Thank you for all the work done to build this plugin.
I'm trying to setup the autoscaler on Infomaniak Public Cloud to run batch workloads.
I got it working up to the point when jobs complete and we need to scale in the pool and then it simply breaks. Here is sample of the log from the autoscaler:
What seems to happen here is that on Infomaniak cloud, the
unique.platform.aws.hostname
attribute is a full hostname (likenomad-batch-e0a3f0ee-5418.dc3-a.pub1.infomaniak.cloud
) and not simply the instance name from Openstack.osNovaNodeIDMapBuilder
is pickingunique.platform.aws.hostname
as the "ClusterNodeID" forIdentifyScaleInRemoteIDs
[setup here], ending up withnomad-batch-e0a3f0ee-5418.dc3-a.pub1.infomaniak.cloud
countServers
is picking the instance name from OS, ending up withnomad-batch-e0a3f0ee-5418
Then,
RunPreScaleInTasksWithRemoteCheck
gets confused becauseid.RemoteResourceId
is the full hostname whileremoteId
is the instance name. Everything is filtered out, and nothing is left to scale in.nomad-batch-e0a3f0ee-5418
10.0.0.144
nomad-batch-e0a3f0ee-5418.dc3-a.pub1.infomaniak.cloud
i-001787c3
(doesn't seem to match anything fromopenstack server show ...
)nomad-batch-e0a3f0ee-5418.dc3-a.pub1.infomaniak.cloud
Notably, nothing has the instance ID.
I tried to set
id_attribute = "unique.hostname"
as this one is indeed the instance name. But setting anything asid_attribute
also setst.idMapper = true
, forcing me to select the instance ID and not its name. 🙃For completion's sake, here is the configuration for the plugin:
I don't think there is any solution to the issue with only configuration changes. I am open to submit a pull request, but what would be the prefered way to tackle this?
Thanks