metallb / metallb

A network load-balancer implementation for Kubernetes using standard routing protocols
https://metallb.universe.tf
Apache License 2.0
6.94k stars 902 forks source link

Heads up: breaking changes in 0.13.x #1473

Closed fedepaol closed 2 years ago

fedepaol commented 2 years ago

The latest metallb version is now configurable only via CRs. Please refer to the documentation on how to use the configmap to CR converter (doc here https://metallb.universe.tf/#backward-compatibility)

tamcore commented 2 years ago

Please provide a parameter, to pass address-pools along through Helm values and throw an error if configInline is set (to prevent destroying existing deployments).

fedepaol commented 2 years ago

Please provide a parameter, to pass address-pools along through Helm values and throw an error if configInline is set (to prevent destroying existing deployments).

Yep, we were discussing with @gclawes about that. Not sure we can intercept configInline though

tamcore commented 2 years ago

What about

{{- if .Values.configInline }}
{{- fail "configInline no longer supported" }}
{{- end }}

? Just with a more descriptive error?

tamcore commented 2 years ago

And at least for IPAddressPool objects I'd like to leave this one here. Saves me from having to open a PR :) That could at least be an easy fix, and make breaking changes "less breaking" :)

{{ $global := . }}
{{- if .Values.addresspools }}
{{- range $pool := .Values.addresspools }}
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: {{ template "metallb.fullname" $global }}-{{ $pool.name }}
spec:
  addresses:
    {{- range $address := $pool.addresses }}
    - {{ $address | quote }}
    {{- end }}
  {{- if $pool.autoassign }}
  autoAssign: {{ $pool.autoassign }}
  {{- end }}
{{- end }}
{{- end }}

Works with values like

addresspools:
- name: default
  autoassign: true
  addresses:
  - 192.168.69.30-192.168.69.39

(tested in my homelab)

jr0dd commented 2 years ago

Everything went smooth for the most part. Just had to make some adjustments for Flux to not deploy the CR's before the chart is deployed. It did take a couple attempts to get the CR's loaded in the cluster after the pods were running.

{"level":"error","ts":1657129516.5686264,"msg":"Reconciler error","controller":"cert-rotator","object":{"name":"webhook-server-cert","namespace":"networking"},"namespace":"networking","name":"webhook-server-cert","reconcileID":"1a97259f-b252-4af3-8ac0-c727359e2278","error":"Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io \"bgppeers.metallb.io\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234"}
✗ Kustomization reconciliation failed: BGPPeer/networking/mikrotik dry-run failed, reason: InternalError, error: Internal error occurred: failed calling webhook "bgppeervalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.networking.svc:443/validate-metallb-io-v1beta2-bgppeer?timeout=10s": proxy error from 127.0.0.1:6443 while dialing 172.20.0.186:9443, code 500: 500 Internal Server Error
jr0dd commented 2 years ago

hmm definitely getting spammed with alerts every time flux reconciles my git repo because of that validating webhook.

fedepaol commented 2 years ago

hmm definitely getting spammed with alerts every time flux reconciles my git repo because of that validating webhook.

Can you expand a bit? Are you getting the errors only after the initial metallb deployment (while the webhooks are being set up) or also after that? If the latter, would you mind filing a separate issue?

jr0dd commented 2 years ago

yeah it's after. i'll open a new issue regarding that.

fedepaol commented 2 years ago

yeah it's after. i'll open a new issue regarding that.

Thanks! It would help to have the logs of the controller raised as debug via the -loglevel flag

druesendieb commented 2 years ago

First of all, thank you for your work in metallb, I am really happy to see progress here! 💯

I have mixed feelings about this issue and wanted to give feedback:

Versioning

This release introduces breaking changes, but it's only a minor release. One could argue, that due to this breaking change and according to semver a major version bump should've been done.

Especially with the use of dependency mgmt tools like renovate or GitOps operators image update mechanisms that rely on semantic versioning (and trust in the correct usage of it from the dependencies), this release will cause broken clusters.

Whats your take on semantic versioning used within the project?

Maturity/Backwards compatibility

While looking for an answer of aboves question, I've read about the Project maturity, here it's stated that:

Backward-incompatible changes to configuration will be rolled out in a “make before break” fashion: first there will be a release that understands the new and the old configuration format, so that you can upgrade your configuration separately from the code. The next release after that removes support for the old configuration format.

That is clearly not the case with this release.

fedepaol commented 2 years ago

First of all, thank you for your work in metallb, I am really happy to see progress here! 100

I have mixed feelings about this issue and wanted to give feedback:

Versioning

This release introduces breaking changes, but it's only a minor release. One could argue, that due to this breaking change and according to semver a major version bump should've been done.

Especially with the use of dependency mgmt tools like renovate or GitOps operators image update mechanisms that rely on semantic versioning (and trust in the correct usage of it from the dependencies), this release will cause broken clusters.

Whats your take on semantic versioning used within the project?

There has been a pretty long discussion together with the previous (now retired) maintainers (@rata, @johananl and @gclawes).

MetalLB is respecting semantic versioning. A "0" major version implies a non stable api by nature (see https://semver.org/), so any breaking changes between a 0.x and the next one may be breaking. Raising the major version would've meant to call the API and metallb stable, which is something we don't want at this point, because of all the novelties we added to the API.

See the points:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.
Version 1.0.0 defines the public API. The way in which the version number is incremented after this release is dependent on this public API and how it changes.

We are going 1.0.0 soon or later, right after the new CRD based api stabilizes.

Maturity/Backwards compatibility

While looking for an answer of aboves question, I've read about the Project maturity, here it's stated that:

Backward-incompatible changes to configuration will be rolled out in a “make before break” fashion: first there will be a release that understands the new and the old configuration format, so that you can upgrade your configuration separately from the code. The next release after that removes support for the old configuration format.

That is clearly not the case with this release.

That was (and always has been) the case with changes within the configmap. And it will probably be the case with the new API, from now on, unless we hit something terrible that needs to be changed.

On the other hand, making the old configmap and a new configuration co-exist would have resulted in a more complex codebase and in corner cases to be handled properly. Because of that, and based on the considerations above, we thought that providing a tool for converting the old configuration format to the new one would have been a good compromise in terms of transition and complexity of metallb. I really hope this won't obfuscate all the good will and intentions we are putting in metallb since we (I) started maintaining the project.

druesendieb commented 2 years ago

First of all, thank you for your work in metallb, I am really happy to see progress here I have mixed feelings about this issue and wanted to give feedback:

Versioning

This release introduces breaking changes, but it's only a minor release. One could argue, that due to this breaking change and according to semver a major version bump should've been done. Especially with the use of dependency mgmt tools like renovate or GitOps operators image update mechanisms that rely on semantic versioning (and trust in the correct usage of it from the dependencies), this release will cause broken clusters. Whats your take on semantic versioning used within the project?

There has been a pretty long discussion together with the previous (now retired) maintainers (@rata, @johananl and @gclawes).

MetalLB is respecting semantic versioning. A "0" major version implies a non stable api by nature (see https://semver.org/), so any breaking changes between a 0.x and the next one may be breaking. Raising the major version would've meant to call the API and metallb stable, which is something we don't want at this point, because of all the novelties we added to the API.

Thanks for clarifying.

(...) I really hope this won't obfuscate all the good will and intentions we are putting in metallb since we (I) started maintaining the project.

Don't get me wrong - just pointing out what I've obsevered. As my initial message stated, I am very grateful for the hard work that is flowing into this project.

kub3let commented 2 years ago

I had a very simple setup using helm that stopped working with v 0.13.x

The error I get is Starting from v0.13.0 configInline is no longer supported. Please see https://metallb.universe.tf/#backward-compatibility

My ansible setup is the following:

- name: add metallb helm repo
  kubernetes.core.helm_repository:
    name: metallb
    repo_url: https://metallb.github.io/metallb

- name: deploy metallb load balancer
  kubernetes.core.helm:
    name: metallb
    chart_ref: metallb/metallb
    namespace: metallb-system
    update_repo_cache: yes
    create_namespace: yes
    values:
      configInline:
        address-pools:
        - name: default
          protocol: layer2
          addresses:
            - 192.168.1.150-192.168.1.250

When I try the converter tool all I get is a empty resources.yaml e.g.

root@testlab:~# cd /tmp
root@testlab:/tmp# cat << 'EOF' > config.yaml
address-pools:
  - name: default
    protocol: layer2
    addresses:
      - 192.168.1.150-192.168.1.250
EOF
root@testlab:/tmp# docker run -d -v $(pwd):/var/input quay.io/metallb/configmaptocrs -source config.yaml -vvv
b5e60fed21c30e24753b9417b0fe321aab9118426eec4901c319d3556a3ea434
root@testlab:/tmp# cat resources.yaml
root@testlab:/tmp# stat resources.yaml 
  File: resources.yaml
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd01h/64769d    Inode: 17039433    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-07-17 22:54:21.899918302 +0200
Modify: 2022-07-17 22:54:12.311698719 +0200
Change: 2022-07-17 22:54:12.311698719 +0200
 Birth: -

Can someone help me with translating the address pool to helm values.yaml, is that still possible or do I need to apply a custom kubernetes yaml ? Is there any new documentation for it ?

Pseudow commented 2 years ago

I am issuing the same problem guess I am going to downgrade the versions

jr0dd commented 2 years ago

I had a very simple setup using helm that stopped working with v 0.13.x

The error I get is Starting from v0.13.0 configInline is no longer supported. Please see https://metallb.universe.tf/#backward-compatibility

My ansible setup is the following:


- name: add metallb helm repo

  kubernetes.core.helm_repository:

    name: metallb

    repo_url: https://metallb.github.io/metallb

- name: deploy metallb load balancer

  kubernetes.core.helm:

    name: metallb

    chart_ref: metallb/metallb

    namespace: metallb-system

    update_repo_cache: yes

    create_namespace: yes

    values:

      configInline:

        address-pools:

        - name: default

          protocol: layer2

          addresses:

            - 192.168.1.150-192.168.1.250

When I try the converter tool all I get is a empty resources.yaml e.g.


root@testlab:~# cd /tmp

root@testlab:/tmp# cat << 'EOF' > config.yaml

address-pools:

  - name: default

    protocol: layer2

    addresses:

      - 192.168.1.150-192.168.1.250

EOF

root@testlab:/tmp# docker run -d -v $(pwd):/var/input quay.io/metallb/configmaptocrs -source config.yaml -vvv

b5e60fed21c30e24753b9417b0fe321aab9118426eec4901c319d3556a3ea434

root@testlab:/tmp# cat resources.yaml

root@testlab:/tmp# stat resources.yaml 

  File: resources.yaml

  Size: 0               Blocks: 0          IO Block: 4096   regular empty file

Device: fd01h/64769d    Inode: 17039433    Links: 1

Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)

Access: 2022-07-17 22:54:21.899918302 +0200

Modify: 2022-07-17 22:54:12.311698719 +0200

Change: 2022-07-17 22:54:12.311698719 +0200

 Birth: -

Can someone help me with translating the address pool to helm values.yaml, is that still possible or do I need to apply a custom kubernetes yaml ? Is there any new documentation for it ?

That's because that's not a valid configmap. That's just the section in the values.yaml that generates the actual configmap. k get from the cluster.

fernferret commented 2 years ago

Maybe we can have the migrate step describe that we need to pull the config.yaml out of kubernetes first? This bit me at first and might help others too!

The docs currently state:

...

In order to use the tool the container must be ran mapping the path with the source file config.yaml to /var/input, and it will generate a resources.yaml file containing the resources mapped to the yaml.

docker run -d -v $(pwd):/var/input quay.io/metallb/configmaptocrs -source config.yaml 

Maybe this could be changed to (something like):

...

In order to use the tool, the following container must be run with the ConfigMap config.yaml mapped into the /var/input folder. On success it will generate a resources.yaml file containing the new MetalLB CRD based config.

You'll need to pull a copy of the current ConfigMap out of kuberentes then run the configmaptorcrs docker container. This example assumes MetalLB was installed in the metallb-system namespace and the ConfigMap was named metallb:

kubectl get configmap -n metallb-system metallb > config.yaml
docker run -d -v $(pwd):/var/input quay.io/metallb/configmaptocrs -source config.yaml

This will produce a file named resources.yaml that can be applied like so:

kubectl apply --dry-run=server resources.yaml -n metallb-system

On the subject of configInline I'm also in the camp of folks who would like to keep my config (even if it's a CRD) tightly coupled with the chart. This always gets slightly awkward if the chart manages CRDs itself which is why I usually opt to install the CRDs first.

liornoy commented 2 years ago

Hello @fernferret, thank you for the suggestion on improving the migration (configmaptocrs) description. I will add it to the PR I am currently working on.

djryanj commented 2 years ago

May I suggest as an improvement to the controller to help people that logging be improved around the config.

I was only able to sort out that I was missing a config because of this line after enabling debugging: https://github.com/metallb/metallb/blob/main/controller/main.go#L64

Which pointed me in the right direction, but as far as I can tell that's the only place where any mention of a config is made in the logs (and at a deeper debug level).

Instead, on startup (and/or configuration/reconfiguration) some logging in the controller about the current state of the config (present, not present, correct, incorrect) be logged to info with details dumped to debug levels.

zrav commented 2 years ago

I converted my very simple 0.12.1 config with a single IP to 0.13.x, but it fails with:

error: assigned IP not allowed by config, ips: ["10.10.1.163"], level: error, msg: IP allocated by controller not allowed by config.

The config:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2-ip
  namespace: metallb
spec:
  ipAddressPools:
  - default-pool
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-pool
  namespace: metallb
spec:
  addresses:
  - 10.10.1.163/32

Additionally, I noted that with 0.13.x a metallb-speaker pod is deployed on all nodes instead of only worker nodes (how it works with the old version).

fabricesemti80 commented 2 years ago

Hi,

I have some issue as well.

My original config for metallb:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: metallb
  namespace: networking
spec:
  interval: 15m
  chart:
    spec:
      chart: metallb
      version: 0.12.1
      sourceRef:
        kind: HelmRepository
        name: metallb
        namespace: flux-system
  install:
    createNamespace: true
    remediation:
      retries: 5
  upgrade:
    remediation:
      retries: 5
  values:
    configInline:
      address-pools:
        - name: default
          protocol: layer2
          addresses:
            - "${METALLB_LB_RANGE}"
  1. Got the alert from renovatebot about the upgrade to 0.13.4
  2. Exported config map as per instructions
apiVersion: v1
data:
  config: |
    address-pools:
    - addresses:
      - 192.168.1.200-192.168.1.250
      name: default
      protocol: layer2
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: metallb
    meta.helm.sh/release-namespace: networking
  creationTimestamp: "2022-08-19T06:45:38Z"
  labels:
    app.kubernetes.io/instance: metallb
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: metallb
    app.kubernetes.io/version: v0.12.1
    helm.sh/chart: metallb-0.12.1
    helm.toolkit.fluxcd.io/name: metallb
    helm.toolkit.fluxcd.io/namespace: networking
  name: metallb
  namespace: networking
  resourceVersion: "10375"
  uid: c4f8af61-6a7d-4834-b3e7-93ea30dc21f7
  1. Converted to resources
# This was autogenerated by MetalLB's custom resource generator.
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  creationTimestamp: null
  name: default
  namespace: networking
spec:
  addresses:
  - 192.168.1.200-192.168.1.250
status: {}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  creationTimestamp: null
  name: l2advertisement1
  namespace: networking
spec:
  ipAddressPools:
  - default
status: {}
---
  1. Added them to kustomization
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - helm-release.yaml
  # - resources.yaml
  1. Updated helm release from 0.12.1 -> 0.13.4
  2. Helm failed to upgrade (mising CRD-s)
  3. Added CRD-s
  4. Resources created:

    
    # k describe ipaddresspools.metallb.io
    Name:         first-pool
    Namespace:    networking
    Labels:       kustomize.toolkit.fluxcd.io/name=apps
              kustomize.toolkit.fluxcd.io/namespace=flux-system
    Annotations:  <none>
    API Version:  metallb.io/v1beta1
    Kind:         IPAddressPool
    Metadata:
    Creation Timestamp:  2022-08-27T08:23:25Z
    Generation:          1
    Managed Fields:
    API Version:  metallb.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:kustomize.toolkit.fluxcd.io/name:
          f:kustomize.toolkit.fluxcd.io/namespace:
      f:spec:
        f:addresses:
    Manager:         kustomize-controller
    Operation:       Apply
    Time:            2022-08-27T08:23:25Z
    Resource Version:  5862530
    UID:               d7a1a3f2-758a-46ed-a568-e1e495658eac
    Spec:
    Addresses:
    192.168.1.200-192.168.1.250
    Auto Assign:       true
    Avoid Buggy I Ps:  false
    Events:              <none>

k describe l2advertisements.metallb.io

Name: l2advertisement1 Namespace: networking Labels: kustomize.toolkit.fluxcd.io/name=apps kustomize.toolkit.fluxcd.io/namespace=flux-system Annotations: API Version: metallb.io/v1beta1 Kind: L2Advertisement Metadata: Creation Timestamp: 2022-08-27T08:02:49Z Generation: 2 Managed Fields: API Version: metallb.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: f:kustomize.toolkit.fluxcd.io/name: f:kustomize.toolkit.fluxcd.io/namespace: f:spec: f:ipAddressPools: Manager: kustomize-controller Operation: Apply Time: 2022-08-27T08:23:25Z Resource Version: 5862532 UID: 978e5816-4306-4cfd-8fe7-09c35b28664c Spec: Ip Address Pools: first-pool Events:


9. Metallb still not upgraded
10. Reverted it to `0.12.1`

11. ???
fedepaol commented 2 years ago

I am gonna close this issue.

@djryanj / @zrav / @fabricesemti80 would you mind filing new issues for your comments?