Closed victoitor closed 9 months ago
Here I'll post what I found out looking at the source code. I'm new to this source code and I have never programmed in go, but I'll do my best to figure out this issue.
The function which creates an instance is instancesPost
with the following definition.
func instancesPost(d *Daemon, r *http.Request) response.Response {
It's important to note right in the beginning of this function the following line, which will be important later.
clusterNotification := isClusterNotification(r)
In the previous line, r
is the request given in the input to instancesPost
.
A few lines down there is a small amount of possibly problematic code which is related to a bug I posted previously in lxd.
// If we're getting binary content, process separately
if r.Header.Get("Content-Type") == "application/octet-stream" {
return createFromBackup(s, r, targetProjectName, r.Body, r.Header.Get("X-Incus-pool"), r.Header.Get("X-Incus-name"))
}
The bug I posted previously was also related to not checking project limits, but only in the case of importing an image. This part of the code is possibly problematic as it forks the code of instancesPost
and checks for the same logic on both parts of the code in different ways, leading to other possible future problems. Should this be refactored so that checking project limits is done in only one place? There might be other logic which is also checked in two parts instead of just one.
For example, on createFromBackup
, checking for project limits is done in the following lines in the call for s.DB.Cluster.Transaction
.
err = s.DB.Cluster.Transaction(s.ShutdownCtx, func(ctx context.Context, tx *db.ClusterTx) error {
req = api.InstancesPost{
InstancePut: bInfo.Config.Container.InstancePut,
Name: bInfo.Name,
Source: api.InstanceSource{}, // Only relevant for "copy" or "migration", but may not be nil.
Type: api.InstanceType(bInfo.Config.Container.Type),
}
return project.AllowInstanceCreation(tx, projectName, req)
})
Inside instancesPost
, checking for project limits is also made inside another call to s.DB.Cluster.Transaction
, but on the following lines.
if !clusterNotification {
// Check that the project's limits are not violated. Note this check is performed after
// automatically generated config values (such as ones from an InstanceType) have been set.
err = project.AllowInstanceCreation(tx, targetProjectName, req)
if err != nil {
return err
}
In this case, they are checked only if clusterNotification
is false, which was mentioned previously. It's important to note a similar type of check is not made inside createFromBackup
which exhibits a different behaviour between the two.
isClusterNotification
is in another file and is very short.
// Return true if this an API request coming from a cluster node that is
// notifying us of some user-initiated API request that needs some action to be
// taken on this node as well.
func isClusterNotification(r *http.Request) bool {
return r.Header.Get("User-Agent") == clusterRequest.UserAgentNotifier
}
I can see the comment on what it does, but I'm not sure I understand why project limits would not be checked if this returned false. Furthermore, why would some calls I made return different values of this function?
From the code, it doesn't look like it's anything related to instance scheduling. I've been thinking about it and my (wild) guess is that isClusterNotification
is cycling on machines and might return false when the call is made towards the same cluster node which originated the call. In this case, it makes no sense for project limits not to be checked as it's an unrelated test.
It's still awkward not to check for project limits in a different manner from importing an image and when creating it from scratch.
Reproduced and debugging this one now
Required information
Issue description
Project
limits.instances
does not work in a cluster when the value is a multiple of the number of machines in the cluster. In particular, I have a cluster with 3 machines and with no other instance running. When I create a project withlimits.instances
which is not a múltiple of the number of machines, the limit works. When it is a multiple of the number of machines, then the limit is one more (limit of 1 when set to 0, 4 when set to 3 and 7 when set to 6). I cannot reproduce this issue in a non-clustered environment.My guess is related to the cluster machine scheduler since it schedules the instances cycling through the machines. Whenever it is about to start a new cycle, it starts the instance without checking project limits.
Steps to reproduce
limits.instances=3
The precise commands I've run related to this can be found on the forum post in discuss.linuxcontainers.org. There I also show more evidence which relates this bug to the scheduler.