Closed ana-v-espinoza closed 3 weeks ago
My first idea is to see if there is a kernel in the repository that we can use with Ubuntu minimal
@ana-v-espinoza what about adding linux-image-generic
to the list of installed packages by kubespray? I tested in a standalone VM created with exosphere and this worked.
This is needed only in case we need a NFS server.
If you get this working please open a PR to https://github.com/zonca/jetstream_kubespray, then I can put a note about that https://www.zonca.dev/posts/2023-02-06-nfs-server-kubernetes-jetstream.
In case you do not have time, let me know and I'll take care of doing a test of this.
Hey Andrea,
Thanks for the suggestion. Interestingly, I came across a different result than you do. After installing linux-image-generic
and performing a reboot of the system, the server will still not run and it seems that the necessary kernel module is still unavailable:
$ apt list --installed linux-image-generic
Listing... Done
linux-image-generic/jammy-updates,jammy-security,now 5.15.0.117.117 amd64 [installed]
N: There is 1 additional version. Please use the '-a' switch to see it
$ modinfo nfsd
modinfo: ERROR: Module nfsd not found.
I'll note that the "1 additional version" referred to in the output of my apt list
command is simply an older version (5.15.0.25.27
). I don't expect this to be relevant. I also ran an apt update
to ensure I was checking the most recent packages in the repositories.
Just as interestingly, on a JHub cluster (using the FeaturedUbuntu image), even though linux-image-generic
is not explicitly installed (maybe similar dependencies are installed instead?), the kernel module still exists and is loaded:
$ apt list --installed linux-image-generic
Listing... Done
$ modinfo nfsd | head
filename: /lib/modules/5.15.0-105-generic/kernel/fs/nfsd/nfsd.ko
license: GPL
author: Olaf Kirch <okir@monad.swb.de>
alias: fs-nfsd
srcversion: B89668F8B84462939B70760
depends: auth_rpcgss,sunrpc,grace,lockd,nfs_acl
retpoline: Y
intree: Y
name: nfsd
vermagic: 5.15.0-105-generic SMP mod_unload modversions
$ lsmod | grep nfsd
nfsd 561152 11
nfs_acl 16384 1 nfsd
auth_rpcgss 139264 2 nfsd,rpcsec_gss_krb5
lockd 110592 2 nfsd,nfs
grace 16384 2 nfsd,lockd
sunrpc 585728 13 nfsd,nfsv4,auth_rpcgss,lockd,rpcsec_gss_krb5,nfs_acl,nfs
Hey Andrea,
Thanks for the suggestion. Interestingly, I came across a different result than you do. After installing
linux-image-generic
and performing a reboot of the system, the server will still not run and it seems that the necessary kernel module is still unavailable:$ apt list --installed linux-image-generic Listing... Done linux-image-generic/jammy-updates,jammy-security,now 5.15.0.117.117 amd64 [installed] N: There is 1 additional version. Please use the '-a' switch to see it $ modinfo nfsd modinfo: ERROR: Module nfsd not found.
I'll note that the "1 additional version" referred to in the output of my
apt list
command is simply an older version (5.15.0.25.27
). I don't expect this to be relevant. I also ran anapt update
to ensure I was checking the most recent packages in the repositories.
you probably need to remove the kvm
kernel and reboot:
apt purge linux*kvm
then check with uname -a
which kernel is running
Just as interestingly, on a JHub cluster (using the FeaturedUbuntu image), even though
linux-image-generic
is not explicitly installed (maybe similar dependencies are installed instead?), the kernel module still exists and is loaded:$ apt list --installed linux-image-generic Listing... Done $ modinfo nfsd | head filename: /lib/modules/5.15.0-105-generic/kernel/fs/nfsd/nfsd.ko license: GPL author: Olaf Kirch <okir@monad.swb.de> alias: fs-nfsd srcversion: B89668F8B84462939B70760 depends: auth_rpcgss,sunrpc,grace,lockd,nfs_acl retpoline: Y intree: Y name: nfsd vermagic: 5.15.0-105-generic SMP mod_unload modversions $ lsmod | grep nfsd nfsd 561152 11 nfs_acl 16384 1 nfsd auth_rpcgss 139264 2 nfsd,rpcsec_gss_krb5 lockd 110592 2 nfsd,nfs grace 16384 2 nfsd,lockd sunrpc 585728 13 nfsd,nfsv4,auth_rpcgss,lockd,rpcsec_gss_krb5,nfs_acl,nfs
check also here with uname -a
which kernel is installed, it is possible it is another generic
kernel or a hwe
one, they are both full featured kernels.
Andrea,
It is as you say, the node was still running on the old kvm kernel:
$ uname -a
Linux uf24f-k8s-node-nf-1 5.15.0-1063-kvm #68-Ubuntu SMP Fri Jul 12 08:20:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ apt list --installed linux*kvm
Listing... Done
linux-headers-5.15.0-1058-kvm/jammy-updates,jammy-security,now 5.15.0-1058.63 amd64 [installed,automatic]
linux-headers-5.15.0-1063-kvm/jammy-updates,jammy-security,now 5.15.0-1063.68 amd64 [installed,automatic]
linux-headers-kvm/jammy-updates,jammy-security,now 5.15.0.1063.59 amd64 [installed,automatic]
linux-image-5.15.0-1058-kvm/jammy-updates,jammy-security,now 5.15.0-1058.63 amd64 [installed,automatic]
linux-image-5.15.0-1063-kvm/jammy-updates,jammy-security,now 5.15.0-1063.68 amd64 [installed,automatic]
linux-image-kvm/jammy-updates,jammy-security,now 5.15.0.1063.59 amd64 [installed,automatic]
linux-kvm/jammy-updates,jammy-security,now 5.15.0.1063.59 amd64 [installed]
linux-modules-5.15.0-1058-kvm/jammy-updates,jammy-security,now 5.15.0-1058.63 amd64 [installed,automatic]
linux-modules-5.15.0-1063-kvm/jammy-updates,jammy-security,now 5.15.0-1063.68 amd64 [installed,automatic]
After purging the system of these packages and rebooting, everything is as expected: the nfs-server pod has exported the drive, the test-nfs-server Pod launches and is able to access this drive, and the drive is accessible via the single user JupyterLab instances.
Thanks a ton for your help!
Instead of making this change in Jetstream-Kubespray, would it be possible to alter the Ubuntu2204Minimal image? To me this feels more appropriate, although it may go slightly against the spirit of having a "minimal" image.
Best,
-- ana
@ana-v-espinoza of course, try Ubuntu2204MinimalGenKernel
ID
671f03f1-4d86-4363-b03c-f5d54818693a
Excellent Andrea!
We appreciate all your knowledge and help as always. Tomorrow we will test provisioning a cluster and NFS shared drive with this new image and get back to you.
-- ana
Hi @zonca. Thank you for setting this up. Unfortunately, when I run terraform, it cannot find Ubuntu2204MinimalGenKernel
or 671f03f1-4d86-4363-b03c-f5d54818693a
. In addition, it does not show up in openstack image list
though it does show when doing a openstack image show 671f03f1-4d86-4363-b03c-f5d54818693a
. Is there a visbility setting that has to be set somewhere? Thanks again.
Not sure, visibility is set to Community
openstack image list --community | grep GenKern
| 671f03f1-4d86-4363-b03c-f5d54818693a | Ubuntu2204MinimalGenKernel | active |
Maybe because it is a "Snapshot" instead of "Image"?
It might be user error on my end now that I think of it. I'll investigate later this afternoon.
ok, I also posted the update on the tutorial page: https://www.zonca.dev/posts/2024-05-07-ubuntu22-minimal-image-jetstream
Hi again. OK, I don't think it was (obvious) user error. There are a few other things that caught my eye:
Compared to Ubuntu2204Minimal
, the disk format is different, i.e., raw
. The size of the image is quite large, 21474836480
(~21GBs
). The min_disk
attribute is also different (20GBs
vs 0
).
it works fine running terraform on my account. Can you please paste the exact error?
@julienchastang I don't think it is any of those issues. Also the Featured images are 20GB and RAW, it is because they are generated from a running instance, the disk is 20GB, but only 1.7GB is used. The notable difference is that it is classified as "Snapshot", while both ubuntu minimal and the featured images are "Image"
Opened a ticket on Access for my reference
Warning: Deprecated Resource
on ../../contrib/terraform/openstack/modules/compute/main.tf line 677, in resource "openstack_compute_floatingip_associate_v2" "bastion":
677: resource "openstack_compute_floatingip_associate_v2" "bastion" {
use openstack_networking_floatingip_associate_v2 resource instead
(and 5 more similar warnings elsewhere)
Error: Your query returned no results. Please change your search criteria and try again.
on ../../contrib/terraform/openstack/modules/compute/main.tf line 1, in data "openstack_images_image_v2" "vm_image":
1: data "openstack_images_image_v2" "vm_image" {
Error: Your query returned no results. Please change your search criteria and try again.
on ../../contrib/terraform/openstack/modules/compute/main.tf line 7, in data "openstack_images_image_v2" "gfs_image":
7: data "openstack_images_image_v2" "gfs_image" {
Error: Your query returned no results. Please change your search criteria and try again.
on ../../contrib/terraform/openstack/modules/compute/main.tf line 13, in data "openstack_images_image_v2" "image_master":
13: data "openstack_images_image_v2" "image_master" {
thanks @julienchastang , yes, I think that is the problem, in the query in main.tf
it says:
source_type = "image"
so, I downloaded the snapshot with image save
and uploaded it again as an image. However, why did it work for me??
Well I've done it, so let's just try:
Ubuntu2204MinimalGenKernelImg
f8211779-dcff-4827-a899-f5b8ad738f8f
Also, can you try to create just a single instance with one of those images using Horizon instead of Terraform?
also, this could help debugging, I see there are actually 2 Ubuntu2204Minimal
images:
o image list --community | grep Minimal
| bd850f29-e7a9-40c9-9fd0-45d3257dec82 | Ubuntu2204Minimal | active |
| 0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69 | Ubuntu2204Minimal | active |
but only the 0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69
shows up in Horizon, which is strange.
can you please verify which one you are actually using in Terraform? Did you create a minimal image yourselves?
which one you are actually using in Terraform?
bd850f29-e7a9-40c9-9fd0-45d3257dec82
Did you create a minimal image yourselves?
To my knowledge, no.
thanks @julienchastang, can you test 0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69
? does it give the same error of the "GenKernel" image?
Yes, it does, but I think that is a red herring. I believe you cannot refer to image IDs in cluster.tfvars
, only image names e.g.,
image = "Ubuntu2204Minimal"
Any image ID provided (e.g., bd850f29-e7a9-40c9-9fd0-45d3257dec82
,0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69
)as well as Ubuntu2204MinimalGenKernel
will give that error. Basically, the error is a result of not being able to find the image.
BTW, turning Terraform debugging to verbose yields:
X-Openstack-Request-Id: req-ea15e2b5-b3a4-41b2-9896-544629bcc310: timestamp=2024-08-02T16:25:08.115Z
2024-08-02T16:25:08.115Z [INFO] plugin.terraform-provider-openstack_v1.54.1: 2024/08/02 16:25:08 [DEBUG] OpenStack Response Body: {
"first": "/v2/images?name=0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69\u0026sort_dir=asc\u0026sort_key=name\u0026status=active",
"images": [],
"schema": "/v2/schemas/images"
}: timestamp=2024-08-02T16:25:08.115Z
2024-08-02T16:25:08.115Z [ERROR] plugin.terraform-provider-openstack_v1.54.1: Response contains error diagnostic: diagnostic_summary="Your query returned no results. Please change your search criteria and try again." tf_rpc=ReadDataSource
diagnostic_severity=ERROR tf_data_source_type=openstack_images_image_v2 tf_proto_version=5.4 @caller=github.com/hashicorp/terraform-plugin-go@v0.19.0/tfprotov5/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_detail= tf_provi
der_addr=registry.terraform.io/terraform-provider-openstack/openstack tf_req_id=7f00d46b-bd55-cb99-e79b-6f3a08693a99 timestamp=2024-08-02T16:25:08.115Z
As far as I understand, TF searches for images by name not ID. Coupled with the fact that the images do not appear when doing a openstack image list
. That's the problem, I believe.
ok @julienchastang so let's wait for the Jetstream team to answer my support ticket.
If this is urgent, you can create the image yourself:
Ubuntu2204Minimal
sudo apt install linux-image-generic
sudo apt purge linux*kvm
modprobe nfsd
doesn't errorHello @zonca & @julienchastang!
I think you figured the image visibility issue out between yourselves. OpenStack will not list community-visible images belonging to other projects unless you explicitly tell it to.
This is as designed:
You can test this by running the following two commands:
$ openstack image list --debug
...
Image client initialized using OpenStack SDK: <openstack.image.v2._proxy.Proxy object at 0x76099041a0a0>
REQ: curl -g -i -X GET https://js2.jetstream-cloud.org:9292/v2/images -H "Accept: application/json" ...
...
$ openstack image list --debug --community
...
Image client initialized using OpenStack SDK: <openstack.image.v2._proxy.Proxy object at 0x79928b5e40d0>
REQ: curl -g -i -X GET "https://js2.jetstream-cloud.org:9292/v2/images?visibility=community" -H "Accept: application/json" ...
...
The first command will return:
The second command will return all community-visible images.
This is a security feature of OpenStack. For a concrete example: Anybody in the Jetstream2 community can create an image called Featured-Ubuntu22
and set its visibility to 'community'. Imagine what would happen if OpenStack listed this image in the default image list. At best it would cause confusion.
We had to figure this out in Exosphere as well: https://gitlab.com/exosphere/exosphere/-/issues/800#note_1073477839
@zonca you can explicitly share the image with @julienchastang using the steps documented here: https://docs.jetstream-cloud.org/ui/cli/snapshot-image/#sharing-an-image
@julienchastang you'll have to accept the shared image in turn.
I hope this helps. I'll respond in the ticket as well.
thanks @julianpistorius, I'll make a pull request to the docs to clarify this.
@julienchastang please send me your project id so I can share the image with you.
@julianpistorius from my understanding from the docs, in https://wiki.openstack.org/wiki/Glance-v2-community-image-visibility-design Openstack allows all users to boot community
images. Jetstream instead is more restrictive and makes community
images the same as shared
images. Right?
Started working on this, but need to understand better how it actually works: https://gitlab.com/jetstream-cloud/docs/-/merge_requests/101
@julienchastang I think booting from community images should work. I'll have to confirm though.
@zonca to answer your earlier question it is EES220002
.
@julienchastang ok, shared image 671f03f1-4d86-4363-b03c-f5d54818693a
with you
Hi @zonca. I am not finding the image:
[openstack@e246e035f3a5 ~]$ openstack image show 671f03f1-4d86-4363-b03c-f5d54818693a
No Image found for 671f03f1-4d86-4363-b03c-f5d54818693a
[openstack@e246e035f3a5 ~]$ openstack image list --community | grep 671f03f1-4d86-4363-b03c-f5d54818693a
[openstack@e246e035f3a5 ~]$ openstack image list --shared
+--------------------------------------+-----------------+--------+
| ID | Name | Status |
+--------------------------------------+-----------------+--------+
| 8d5a9a1d-42bf-41eb-a7cb-87cf8a472766 | Container-Linux | active |
| 7c3a973c-e9b1-460c-b370-320969ca527d | centos7.iso | active |
| 5ff1e12c-b9b3-4dbe-a32f-903fae7f55b1 | rocky8.iso | active |
+--------------------------------------+-----------------+--------+
I think the command to share the image is:
openstack image set --project <project_id> <image_id>
Again, here is our project info:
$ openstack project list
+----------------------------------+-----------+
| ID | Name |
+----------------------------------+-----------+
| 5d01fe81db3f46bfbb354843e5846084 | EES220002 |
+----------------------------------+-----------+
Thanks again.
I ran openstack image add project Ubuntu2204MinimalGenKernel EES220002
following the JS2 docs.
In fact:
openstack image member list Ubuntu2204MinimalGenKernel
+--------------------------------------+-----------+---------+
| Image ID | Member ID | Status |
+--------------------------------------+-----------+---------+
| 671f03f1-4d86-4363-b03c-f5d54818693a | EES220002 | pending |
+--------------------------------------+-----------+---------+
so you should run openstack image set --accept <image UUID or NAME>
From https://docs.jetstream-cloud.org/ui/cli/snapshot-image/
$ openstack image set --accept 671f03f1-4d86-4363-b03c-f5d54818693a
No Image found for 671f03f1-4d86-4363-b03c-f5d54818693a
$ openstack image set --accept Ubuntu2204MinimalGenKernel
No Image found for Ubuntu2204MinimalGenKernel
@julianpistorius it seems we can't share images
This is puzzling. I can both launch an instance from a community image, as well as share and accept images.
In the first OpenStack project which owns the private image, make it community-visible:
$ source project1-openrc.sh
$ IMAGE_ID=5b30084f-070b-4a03-95a5-19494fd180b1
$ openstack image set --community $IMAGE_ID
$ openstack image show --fit-width $IMAGE_ID
# Confirm it's set to community visible
In the second project, which is not a member of the image:
$ source project2-openrc.sh
$ IMAGE_ID=5b30084f-070b-4a03-95a5-19494fd180b1
$ openstack image show --fit-width $IMAGE_ID
# Confirm I can see image
$ openstack server create --os-compute-api-version 2.37 --flavor m3.small --image $IMAGE_ID --network auto_allocated_network --key-name=my-ssh-key --security-group default community-jammy-server
# ... launches a server
I did find that I had to use the image ID when creating the server, and using the image name did not work.
In first project, which owns the private image:
$ IMAGE_ID=b1119fd1-e735-41d0-9766-ec8d33e9be65
$ openstack image show $IMAGE_ID
# Confirm it's private
In second project, which you want to share the image with:
$ IMAGE_ID=b1119fd1-e735-41d0-9766-ec8d33e9be65
$ openstack image show --fit-width $IMAGE_ID
No Image found for b1119fd1-e735-41d0-9766-ec8d33e9be65
Back in the first project:
$ openstack image set --shared $IMAGE_ID
$ OTHER_PROJECT_ID=<uuid-of-second-project>
$ openstack image add project $IMAGE_ID $OTHER_PROJECT_ID
+------------+--------------------------------------+
| Field | Value |
+------------+--------------------------------------+
| created_at | 2024-08-08T21:46:18Z |
| image_id | b1119fd1-e735-41d0-9766-ec8d33e9be65 |
| member_id | <uuid-of-second-project> |
| schema | /v2/schemas/member |
| status | pending |
| updated_at | 2024-08-08T21:46:18Z |
+------------+--------------------------------------+
$ openstack image member list --fit-width $IMAGE_ID
+--------------------------------------+-------------- ------------+---------+
| Image ID | Member ID | Status |
+--------------------------------------+---------------------------+---------+
| b1119fd1-e735-41d0-9766-ec8d33e9be65 | <uuid-of-second-project> | pending |
+--------------------------------------+---------------------------+---------+
In second project:
$ openstack image show --fit-width $IMAGE_ID
# Now I can see the image
$ openstack image set --accept $IMAGE_ID
Back in first project:
$ openstack image member list --fit-width $IMAGE_ID
+--------------------------------------+-------------- ------------+----------+
| Image ID | Member ID | Status |
+--------------------------------------+---------------------------+----------+
| b1119fd1-e735-41d0-9766-ec8d33e9be65 | <uuid-of-second-project> | accepted |
+--------------------------------------+---------------------------+----------+
Maybe we need to set up a screen-sharing session to debug this?
@julianpistorius I confirm I can see your community image and I can launch an instance. can you please try to launch an instance with my community shared image?
IMAGE_ID=f8211779-dcff-4827-a899-f5b8ad738f8f
also @julianpistorius, about uuid-of-second-project
, the docs say "Where project is the AAA000000 number of the allocation you want to share it with." https://docs.jetstream-cloud.org/ui/cli/snapshot-image/#sharing-an-image.
Are you using the actual ID instead?
also @julianpistorius, about
uuid-of-second-project
, the docs say "Where project is the AAA000000 number of the allocation you want to share it with." https://docs.jetstream-cloud.org/ui/cli/snapshot-image/#sharing-an-image. Are you using the actual ID instead?
I am using the UUID, yes.
IMAGE_ID=f8211779-dcff-4827-a899-f5b8ad738f8f
Yes, I was able to launch an instance from this image.
$ IMAGE_ID=f8211779-dcff-4827-a899-f5b8ad738f8f
$ openstack image show --fit-width $IMAGE_ID
# Shows the image details, name is 'Ubuntu2204MinimalGenKernelImg'
$ openstack server create --os-compute-api-version 2.37 --flavor m3.small --image $IMAGE_ID --network auto_allocated_network --key-name=my-ssh-key --security-group default community-Ubuntu2204MinimalGenKernelImg
# Launches...
$ SERVER_ID=<server-uuid>
$ openstack server show --fit-width $SERVER_ID
# Shows server detail...
$ openstack console log show $SERVER_ID
# Booted and showing login prompt...
ok thanks, @julienchastang or @ana-v-espinoza can you try repeating what Julian did and then try again with Terraform and paste here errors?
I am worried that the problem is that Terraform wants image names, but only Image IDs work for shared images.
I am able to successfully launch that image via the openstack CLI (similar to Julian's session). However, terraform yields
Warning: Deprecated Resource
on ../../contrib/terraform/openstack/modules/compute/main.tf line 677, in resource "openstack_compute_floatingip_associate_v2" "bastion":
677: resource "openstack_compute_floatingip_associate_v2" "bastion" {
use openstack_networking_floatingip_associate_v2 resource instead
(and 5 more similar warnings elsewhere)
Error: Your query returned no results. Please change your search criteria and try again.
on ../../contrib/terraform/openstack/modules/compute/main.tf line 1, in data "openstack_images_image_v2" "vm_image":
1: data "openstack_images_image_v2" "vm_image" {
Error: Your query returned no results. Please change your search criteria and try again.
on ../../contrib/terraform/openstack/modules/compute/main.tf line 7, in data "openstack_images_image_v2" "gfs_image":
7: data "openstack_images_image_v2" "gfs_image" {
Error: Your query returned no results. Please change your search criteria and try again.
on ../../contrib/terraform/openstack/modules/compute/main.tf line 13, in data "openstack_images_image_v2" "image_master":
13: data "openstack_images_image_v2" "image_master" {
@julienchastang how are you trying to identify the image? Can you paste your Terraform here?
I'm not a Terraform expert, but could you use something like:
# Data source for the image using image ID
data "openstack_images_image_v2" "vm_image" {
visibility = "community"
name = "Ubuntu2204MinimalGenKernel"
properties = {
id = "f8211779-dcff-4827-a899-f5b8ad738f8f"
}
}
I haven't tried it, but the docs are suggestive: https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/data-sources/images_image_v2
Thanks @zonca. I ended up snapshotting my own image as described earlier. That should work for now. Thanks again.
@julienchastang I think I found a solution, there is a different keyword image_uuid
and image_master_uuid
If you have the opportunity to test this good, otherwise I'll test it later and add some comments in the tutorial files to notify of this possibility.
I tested an it worked fine, I updated the default cluster.tfvars
for future reference
CC: @julienchastang
I have provisioned a JupyterHub using the new Ubuntu2204Minimal image. When attempting to create an NFS shared drive using the usual method, I ran across an issue.
After a
kubectl apply -f
on the necessary.yaml
files and creating thetest_nfs_mount
Pod, I noticed that the Pod never fully created:The Service, PVC, and Pod at first glance all seem like they're doing fine, however, the logs for the nfs-server pod show:
I dig some digging and apparently
nfsd
is a kernel module that isn't installed in the Ubuntu2204Minimal nodes that are installed on the nodes of another cluster with a functioning NFS server.I'm not exactly sure how to remedy this. Any ideas?
Thanks,
ana v e