lima-vm / lima

Linux virtual machines, with a focus on running containers
https://lima-vm.io/
Apache License 2.0
15.22k stars 600 forks source link

Generate jsonschema for validating the limayaml #2305

Closed afbjorklund closed 1 week ago

afbjorklund commented 6 months ago

Description

There was some discussion about using json-schema.org for documentation, which doesn't go very well...

But it could still be used for syntax-checking the lima.yaml, similar to what is available for cloud-config.yaml

afbjorklund commented 6 months ago

According to regular JSON Schema, using null is not allowed when it comes to ints, strings (ptrs), arrays or maps...

So we need to document which are allowed to be null (some are not), using a special jsonschema:"nullable" syntax

In the default.yaml, the value null is being used as a placeholder for those values that can be overridden or defaulted.

But the actual values will be more like "empty" (0 "" [] {}) than real null pointers, that can't be dereferenced.

jandubois commented 6 months ago

But the actual values will be more like "empty" (0 "" [] {}) than actual null pointers, that can't be dereferenced.

I'm not sure what you mean by "actual", but the values read from lima.yaml will indeed include real null pointers. It is the responsibility of limayaml.FillDefault() to perform the override.yaml and default.yaml merging operation, and then fill in any remaining settings with the builtin defaults.

So the result from FillDefault should not have any null pointers in them, but the JSON file most certainly can and will.

FWIW, I found message is the only top-level field that is a string instead of a *string, and I think this is a bug. Why should you not be able to specify a default message in default.yaml?

afbjorklund commented 6 months ago

With the "actual values", I meant those that will be used in the end - i.e. the ones that will replace the null value.

But currently the documentation (comment) said: 🟢 Builtin default: null (changed it in 102e44cc705cccccb27166bbb5c3c6f0801235c1)

FWIW, I found message is the only top-level field that is a string instead of a *string, and I think this is a bug. Why should you not be able to specify a default message in default.yaml?

I think there were three of them:

They can probably be replaced by *string, or maybe the cpuType can be commented out (null) like the others?

cpuType:
  # 🟢 Builtin default: "cortex-a72" (or "host" when running on aarch64 host)
  aarch64: ""
  # 🟢 Builtin default: "cortex-a7" (or "host" when running on armv7l host)
  armv7l: ""
  # 🟢 Builtin default: "qemu64" (or "host,-pdpe1gb" when running on x86_64 host)
  x86_64: ""
  # 🟢 Builtin default: "rv64" (or "host" when running on riscv64 host)
  riscv64: ""

The current result is that the keys will have to encoded anyway, no matter if it is empty string or null values.

So the default yaml could just have cpuType: null (by commenting the default keys out, like networks etc)

jandubois commented 6 months ago

mountPoint is a field inside the Mounts array, not a top-level field. It should be nullable, but we get away with it being a string because the empty string is not a valid value either, so we treat the empty string the same as nil.

Same thing for cpuType: the empty string is not a valid value.

Making them nullable would be more consistent, but I don't think it really matters.

I do think we should fix message though, even though it seems like an improbable use-case.

afbjorklund commented 6 months ago

It only mattered for the technicalities of jsonschema, I don't think users (or Go) really cared...

Otherwise the generated schema now "works", but it does not copy any documentation.

i.e. the Go struct is now undocumented, since all the documentation is in the default.yaml

Otherwise there is some feature, to copy the comments from the struct (but no "javadoc"*)


Typically using a syntax like /// or /**. But there is no such convention in Go, at least not as far as I know

It would only be useful if we actually wanted people to using the jsonschema for anything, like validation.

But then it would need to be hosted somewhere (an URL), even if one could hack it with raw.github.com

jandubois commented 6 months ago

So the default yaml could just have cpuType: null (by commenting the default keys out, like networks etc)

We can do this anyways, as setting the value to null will still result in the field being set to the empty string.

For example:

mounts:
- location: "~"
  # Configure the mountPoint inside the guest.
  # 🟢 Builtin default: value of location
  mountPoint: null

mountPoint is a string, so it will be set to the empty string.

maybe the cpuType can be commented out (null) like the others?

They are already set to null; maybe I misunderstand what you are saying:

cpuType:
  # 🟢 Builtin default: "cortex-a72" (or "host" when running on aarch64 host)
  aarch64: null
  # 🟢 Builtin default: "cortex-a7" (or "host" when running on armv7l host)
  armv7l: null
  # 🟢 Builtin default: "qemu64" (or "host,-pdpe1gb" when running on x86_64 host)
  x86_64: null
afbjorklund commented 6 months ago

The issue is that I got jsonschema validation errors on those fields, since they are defined as string.

examples/default.yaml::$.mounts[0].mountPoint: None is not of type 'string'

examples/default.yaml::$.cpuType.x86_64: None is not of type 'string' examples/default.yaml::$.cpuType.aarch64: None is not of type 'string' examples/default.yaml::$.cpuType.armv7l: None is not of type 'string'

jandubois commented 6 months ago

It would only be useful if we actually wanted people to using the jsonschema for anything, like validation.

I don't understand why we need this? We already have limactl validate lima.yaml; what does the json schema get us?

jandubois commented 6 months ago

The issue is that I got jsonschema validation errors on those fields, since they are defined as string.

I'm fine with changing them all to *string and updating FillDefault to deal with it.

afbjorklund commented 6 months ago

Similar to the cloud-config.yaml validation, it is just for following a technical standard. Not really meant for our users.

I don't think that it will ever be able to handle the features that are currently in the default.yaml and limactl validate...

Generating it from the code seemed like a low-maintenance approach, so was looking to see whether it was viable.

I will leave it in draft, I think it will be better once the NullableFromType feature (for pointers) of has been merged.

jandubois commented 6 months ago

Similar to the cloud-config.yaml validation, it is just for following a technical standard. Not really meant for our users.

That reminds me: I think you implemented the user-data validation to run at runtime, so we would have to bundle the schema etc. I think this should at most be a build-time/CI thing but not included in the binaries.

After all, if there is a schema error in the generated files, there is nothing the user could do about it beyond filing a bug. So it essentially becomes a very heavy assert statement.

afbjorklund commented 6 months ago

I think the official approach (from cloud-init) is to run the schema validation at first boot, which is a bit "late" for us.

But you are probably right, it would be better off in somewhere like limactl validate - or maybe just a build feature.

After all, the main benefit is finding holes in the official specification "standard".

jandubois commented 6 months ago

I think the official approach (from cloud-init) is to run the schema validation at first boot, which is a bit "late" for us.

Yes, but these are just warnings in cloud-init-output.log, so they don't break anything.

As I wrote above, showing the validation errors/warnings to the users is pointless because there is nothing they can do to fix them. It would just be noise to them.

After all, the main benefit is finding holes in the official specification "standard".

I agree that validating the files as part of CI and checking for errors would be useful.

But remember that we do have to accept deprecation warnings because we need to remain compatible with older cloud-init versions. So we can't just check that there are no warnings at all.

afbjorklund commented 6 months ago

I will leave them as drafts for now, and cherry-pick the suggested (small) changes separately.

afbjorklund commented 5 months ago

Changed it to run only on-demand, I think eventually it will just use an URL for the validation...

afbjorklund commented 5 months ago

Expressing map[string]*string is a "problem" in jsonschema, so left the map as null instead:

# Specify desired QEMU CPU type for each arch.
# You can see what options are available for host emulation with: `qemu-system-$(arch) -cpu help`.
# Setting of instructions is supported like this: "qemu64,+ssse3".
# 🟢 Builtin default: hard-coded arch map with type (see the output of `limactl info | jq .defaultTemplate.cpuType`)
cpuType:
  # aarch64: "cortex-a72" # (or "host" when running on aarch64 host)
  # armv7l: "cortex-a7" # (or "host" when running on armv7l host)
  # riscv64: "rv64" # (or "host" when running on riscv64 host)
  # x86_64: "qemu64" # (or "host,-pdpe1gb" when running on x86_64 host)

This makes it more similar to other (rather) complicated defaults, like the containerd archives.

containerd:
  # Enable system-wide (aka rootful)  containerd and its dependencies (BuildKit, Stargz Snapshotter)
  # Note that `nerdctl.lima` only works in rootless mode; you have to use `lima sudo nerdctl ...`
  # to use rootful containerd with nerdctl.
  # 🟢 Builtin default: false
  system: null
  # Enable user-scoped (aka rootless) containerd and its dependencies
  # 🟢 Builtin default: true
  user: null
#  # Override containerd archive
#  # 🟢 Builtin default: hard-coded URL with hard-coded digest (see the output of `limactl info | jq .defaultTemplate.containerd.archives`)
#  archives:
#  - location: "~/Downloads/nerdctl-full-X.Y.Z-linux-amd64.tar.gz"
#    arch: "x86_64"
#    digest: "sha256:..."
afbjorklund commented 5 months ago

Trying to remember what the easiest way to call FillDefault in limactl was, but I don't really recall.

EDIT: I remembered it was a patched version of validate, will clean it up and make it into a PR

$ limactl validate --fill examples/default.yaml 
INFO[0000] "examples/default.yaml": OK                  
vmType: qemu
os: Linux
arch: x86_64
images:
- location: https://cloud-images.ubuntu.com/releases/24.04/release-20240423/ubuntu-24.04-server-cloudimg-amd64.img
  arch: x86_64
  digest: sha256:32a9d30d18803da72f5936cf2b7b9efcb4d0bb63c67933f17e3bdfd1751de3f3
- location: https://cloud-images.ubuntu.com/releases/24.04/release-20240423/ubuntu-24.04-server-cloudimg-arm64.img
  arch: aarch64
  digest: sha256:c841bac00925d3e6892d979798103a867931f255f28fefd9d5e07e3e22d0ef22
- location: https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img
  arch: x86_64
- location: https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img
  arch: aarch64
cpuType:
  aarch64: cortex-a72
  armv7l: cortex-a7
  riscv64: rv64
  x86_64: host
cpus: 4
memory: 4GiB
disk: 100GiB
mounts:
- location: "~"
  mountPoint: "~"
  writable: false
  sshfs:
    cache: true
    followSymlinks: false
    sftpDriver: ""
  9p:
    securityModel: none
    protocolVersion: 9p2000.L
    msize: 128KiB
    cache: fscache
- location: /tmp/lima
  mountPoint: /tmp/lima
  writable: true
  sshfs:
    cache: true
    followSymlinks: false
    sftpDriver: ""
  9p:
    securityModel: none
    protocolVersion: 9p2000.L
    msize: 128KiB
    cache: mmap
mountType: reverse-sshfs
mountInotify: false
ssh:
  localPort: 0
  loadDotSSHPubKeys: true
  forwardAgent: false
  forwardX11: false
  forwardX11Trusted: false
firmware:
  legacyBIOS: false
audio:
  device: ""
video:
  display: none
  vnc:
    display: 127.0.0.1:0,to=9
upgradePackages: false
containerd:
  system: false
  user: true
  archives:
  - location: https://github.com/containerd/nerdctl/releases/download/v1.7.6/nerdctl-full-1.7.6-linux-amd64.tar.gz
    arch: x86_64
    digest: sha256:2c841e097fcfb5a1760bd354b3778cb695b44cd01f9f271c17507dc4a0b25606
  - location: https://github.com/containerd/nerdctl/releases/download/v1.7.6/nerdctl-full-1.7.6-linux-arm64.tar.gz
    arch: aarch64
    digest: sha256:77c747f09853ee3d229d77e8de0dd3c85622537d82be57433dc1fca4493bab95
guestInstallPrefix: /usr/local
hostResolver:
  enabled: true
  ipv6: false
propagateProxyEnv: true
caCerts:
  removeDefaults: false
rosetta:
  enabled: false
  binfmt: false
plain: false
timezone: Europe/Stockholm
afbjorklund commented 5 months ago

With "enum" on the string types, they can be checked and offfered editor completion (here code):

image

afbjorklund commented 1 week ago

Handy trick to remove all comments from a file: (previous workaround was json roundtrip)

https://mikefarah.gitbook.io/yq/operators/comment-operators#remove-strip-all-comments