zonca / jupyterhub-deploy-kubernetes-jetstream

Configuration files for my tutorials on deploying JupyterHub on top of Kubernetes on XSEDE Jetstream (Openstack)
https://zonca.dev/categories/#jetstream
23 stars 14 forks source link

Ubuntu2204Minimal Unable to Run NFS Server #80

Closed ana-v-espinoza closed 3 weeks ago

ana-v-espinoza commented 1 month ago

CC: @julienchastang

I have provisioned a JupyterHub using the new Ubuntu2204Minimal image. When attempting to create an NFS shared drive using the usual method, I ran across an issue.

After a kubectl apply -f on the necessary .yaml files and creating the test_nfs_mount Pod, I noticed that the Pod never fully created:

$ kubectl describe pod test-nfs-mount
...
Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  14m (x83 over 5h39m)  kubelet  MountVolume.SetUp failed for volume "nfs-volume" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs <cluster-ip>:/ /var/lib/kubelet/pods/e3db724a-eef5-4e27-ad62-6a89d90e22f0/volumes/kubernetes.io~nfs/nfs-volume
Output: mount.nfs: Connection refused
  Warning  FailedMount  8m25s (x120 over 5h39m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[nfs-volume], unattached volumes=[nfs-volume kube-api-access-scqwm]: timed out waiting for the condition
  Warning  FailedMount  3m53s (x28 over 5h34m)   kubelet  Unable to attach or mount volumes: unmounted volumes=[nfs-volume], unattached volumes=[kube-api-access-scqwm nfs-volume]: timed out waiting for the condition

The Service, PVC, and Pod at first glance all seem like they're doing fine, however, the logs for the nfs-server pod show:

$ kubectl logs nfs-server-6cf9ddc757-vzqdn
...
Starting NFS in the background...
rpc.nfsd: Unable to access /proc/fs/nfsd errno 2 (No such file or directory).
Please try, as root, 'mount -t nfsd nfsd /proc/fs/nfsd' and then restart rpc.nfsd to correct the problem
Exporting File System...
exporting *:/share
/share          <world>
Starting Mountd in the background...These
Startup successful.

I dig some digging and apparently nfsd is a kernel module that isn't installed in the Ubuntu2204Minimal nodes that are installed on the nodes of another cluster with a functioning NFS server.

I'm not exactly sure how to remedy this. Any ideas?

Thanks,

ana v e

zonca commented 1 month ago

My first idea is to see if there is a kernel in the repository that we can use with Ubuntu minimal

zonca commented 1 month ago

@ana-v-espinoza what about adding linux-image-generic to the list of installed packages by kubespray? I tested in a standalone VM created with exosphere and this worked. This is needed only in case we need a NFS server.

If you get this working please open a PR to https://github.com/zonca/jetstream_kubespray, then I can put a note about that https://www.zonca.dev/posts/2023-02-06-nfs-server-kubernetes-jetstream.

In case you do not have time, let me know and I'll take care of doing a test of this.

ana-v-espinoza commented 1 month ago

Hey Andrea,

Thanks for the suggestion. Interestingly, I came across a different result than you do. After installing linux-image-generic and performing a reboot of the system, the server will still not run and it seems that the necessary kernel module is still unavailable:

$ apt list --installed linux-image-generic
Listing... Done                                                      
linux-image-generic/jammy-updates,jammy-security,now 5.15.0.117.117 amd64 [installed]
N: There is 1 additional version. Please use the '-a' switch to see it
$ modinfo nfsd
modinfo: ERROR: Module nfsd not found.

I'll note that the "1 additional version" referred to in the output of my apt list command is simply an older version (5.15.0.25.27). I don't expect this to be relevant. I also ran an apt update to ensure I was checking the most recent packages in the repositories.

Just as interestingly, on a JHub cluster (using the FeaturedUbuntu image), even though linux-image-generic is not explicitly installed (maybe similar dependencies are installed instead?), the kernel module still exists and is loaded:

$ apt list --installed linux-image-generic
Listing... Done
$ modinfo nfsd | head
filename:       /lib/modules/5.15.0-105-generic/kernel/fs/nfsd/nfsd.ko
license:        GPL
author:         Olaf Kirch <okir@monad.swb.de>
alias:          fs-nfsd
srcversion:     B89668F8B84462939B70760
depends:        auth_rpcgss,sunrpc,grace,lockd,nfs_acl
retpoline:      Y
intree:         Y
name:           nfsd
vermagic:       5.15.0-105-generic SMP mod_unload modversions 
$ lsmod | grep nfsd
nfsd                  561152  11
nfs_acl                16384  1 nfsd
auth_rpcgss           139264  2 nfsd,rpcsec_gss_krb5
lockd                 110592  2 nfsd,nfs
grace                  16384  2 nfsd,lockd
sunrpc                585728  13 nfsd,nfsv4,auth_rpcgss,lockd,rpcsec_gss_krb5,nfs_acl,nfs
zonca commented 1 month ago

Hey Andrea,

Thanks for the suggestion. Interestingly, I came across a different result than you do. After installing linux-image-generic and performing a reboot of the system, the server will still not run and it seems that the necessary kernel module is still unavailable:

$ apt list --installed linux-image-generic
Listing... Done                                                      
linux-image-generic/jammy-updates,jammy-security,now 5.15.0.117.117 amd64 [installed]
N: There is 1 additional version. Please use the '-a' switch to see it
$ modinfo nfsd
modinfo: ERROR: Module nfsd not found.

I'll note that the "1 additional version" referred to in the output of my apt list command is simply an older version (5.15.0.25.27). I don't expect this to be relevant. I also ran an apt update to ensure I was checking the most recent packages in the repositories.

you probably need to remove the kvm kernel and reboot:

apt purge linux*kvm

then check with uname -a which kernel is running

Just as interestingly, on a JHub cluster (using the FeaturedUbuntu image), even though linux-image-generic is not explicitly installed (maybe similar dependencies are installed instead?), the kernel module still exists and is loaded:

$ apt list --installed linux-image-generic
Listing... Done
$ modinfo nfsd | head
filename:       /lib/modules/5.15.0-105-generic/kernel/fs/nfsd/nfsd.ko
license:        GPL
author:         Olaf Kirch <okir@monad.swb.de>
alias:          fs-nfsd
srcversion:     B89668F8B84462939B70760
depends:        auth_rpcgss,sunrpc,grace,lockd,nfs_acl
retpoline:      Y
intree:         Y
name:           nfsd
vermagic:       5.15.0-105-generic SMP mod_unload modversions 
$ lsmod | grep nfsd
nfsd                  561152  11
nfs_acl                16384  1 nfsd
auth_rpcgss           139264  2 nfsd,rpcsec_gss_krb5
lockd                 110592  2 nfsd,nfs
grace                  16384  2 nfsd,lockd
sunrpc                585728  13 nfsd,nfsv4,auth_rpcgss,lockd,rpcsec_gss_krb5,nfs_acl,nfs

check also here with uname -a which kernel is installed, it is possible it is another generic kernel or a hwe one, they are both full featured kernels.

ana-v-espinoza commented 1 month ago

Andrea,

It is as you say, the node was still running on the old kvm kernel:

$ uname -a
Linux uf24f-k8s-node-nf-1 5.15.0-1063-kvm #68-Ubuntu SMP Fri Jul 12 08:20:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ apt list --installed linux*kvm
Listing... Done
linux-headers-5.15.0-1058-kvm/jammy-updates,jammy-security,now 5.15.0-1058.63 amd64 [installed,automatic]
linux-headers-5.15.0-1063-kvm/jammy-updates,jammy-security,now 5.15.0-1063.68 amd64 [installed,automatic]
linux-headers-kvm/jammy-updates,jammy-security,now 5.15.0.1063.59 amd64 [installed,automatic]
linux-image-5.15.0-1058-kvm/jammy-updates,jammy-security,now 5.15.0-1058.63 amd64 [installed,automatic]
linux-image-5.15.0-1063-kvm/jammy-updates,jammy-security,now 5.15.0-1063.68 amd64 [installed,automatic]
linux-image-kvm/jammy-updates,jammy-security,now 5.15.0.1063.59 amd64 [installed,automatic]
linux-kvm/jammy-updates,jammy-security,now 5.15.0.1063.59 amd64 [installed]
linux-modules-5.15.0-1058-kvm/jammy-updates,jammy-security,now 5.15.0-1058.63 amd64 [installed,automatic]
linux-modules-5.15.0-1063-kvm/jammy-updates,jammy-security,now 5.15.0-1063.68 amd64 [installed,automatic]

After purging the system of these packages and rebooting, everything is as expected: the nfs-server pod has exported the drive, the test-nfs-server Pod launches and is able to access this drive, and the drive is accessible via the single user JupyterLab instances.

Thanks a ton for your help!

Instead of making this change in Jetstream-Kubespray, would it be possible to alter the Ubuntu2204Minimal image? To me this feels more appropriate, although it may go slightly against the spirit of having a "minimal" image.

Best,

-- ana

zonca commented 1 month ago

@ana-v-espinoza of course, try Ubuntu2204MinimalGenKernel

ID 671f03f1-4d86-4363-b03c-f5d54818693a

ana-v-espinoza commented 1 month ago

Excellent Andrea!

We appreciate all your knowledge and help as always. Tomorrow we will test provisioning a cluster and NFS shared drive with this new image and get back to you.

-- ana

julienchastang commented 1 month ago

Hi @zonca. Thank you for setting this up. Unfortunately, when I run terraform, it cannot find Ubuntu2204MinimalGenKernel or 671f03f1-4d86-4363-b03c-f5d54818693a. In addition, it does not show up in openstack image list though it does show when doing a openstack image show 671f03f1-4d86-4363-b03c-f5d54818693a. Is there a visbility setting that has to be set somewhere? Thanks again.

zonca commented 1 month ago

Not sure, visibility is set to Community

image

openstack image list --community | grep GenKern
| 671f03f1-4d86-4363-b03c-f5d54818693a | Ubuntu2204MinimalGenKernel                                          | active      |
zonca commented 1 month ago

Maybe because it is a "Snapshot" instead of "Image"?

julienchastang commented 1 month ago

It might be user error on my end now that I think of it. I'll investigate later this afternoon.

zonca commented 1 month ago

ok, I also posted the update on the tutorial page: https://www.zonca.dev/posts/2024-05-07-ubuntu22-minimal-image-jetstream

julienchastang commented 1 month ago

Hi again. OK, I don't think it was (obvious) user error. There are a few other things that caught my eye:

Compared to Ubuntu2204Minimal, the disk format is different, i.e., raw. The size of the image is quite large, 21474836480 (~21GBs). The min_disk attribute is also different (20GBs vs 0).

zonca commented 1 month ago

it works fine running terraform on my account. Can you please paste the exact error?

@julienchastang I don't think it is any of those issues. Also the Featured images are 20GB and RAW, it is because they are generated from a running instance, the disk is 20GB, but only 1.7GB is used. The notable difference is that it is classified as "Snapshot", while both ubuntu minimal and the featured images are "Image"

zonca commented 1 month ago

Opened a ticket on Access for my reference

julienchastang commented 1 month ago
Warning: Deprecated Resource

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 677, in resource "openstack_compute_floatingip_associate_v2" "bastion":
 677: resource "openstack_compute_floatingip_associate_v2" "bastion" {

use openstack_networking_floatingip_associate_v2 resource instead

(and 5 more similar warnings elsewhere)

Error: Your query returned no results. Please change your search criteria and try again.

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 1, in data "openstack_images_image_v2" "vm_image":
   1: data "openstack_images_image_v2" "vm_image" {

Error: Your query returned no results. Please change your search criteria and try again.

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 7, in data "openstack_images_image_v2" "gfs_image":
   7: data "openstack_images_image_v2" "gfs_image" {

Error: Your query returned no results. Please change your search criteria and try again.

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 13, in data "openstack_images_image_v2" "image_master":
  13: data "openstack_images_image_v2" "image_master" {
zonca commented 1 month ago

thanks @julienchastang , yes, I think that is the problem, in the query in main.tf it says:

  source_type           = "image"

so, I downloaded the snapshot with image save and uploaded it again as an image. However, why did it work for me??

Well I've done it, so let's just try:

Ubuntu2204MinimalGenKernelImg
f8211779-dcff-4827-a899-f5b8ad738f8f

Also, can you try to create just a single instance with one of those images using Horizon instead of Terraform?

zonca commented 1 month ago

also, this could help debugging, I see there are actually 2 Ubuntu2204Minimal images:

o image list --community | grep Minimal                                                                                                                               
| bd850f29-e7a9-40c9-9fd0-45d3257dec82 | Ubuntu2204Minimal                                                   | active      |
| 0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69 | Ubuntu2204Minimal                                                   | active      |

but only the 0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69 shows up in Horizon, which is strange.

can you please verify which one you are actually using in Terraform? Did you create a minimal image yourselves?

julienchastang commented 1 month ago

which one you are actually using in Terraform?

bd850f29-e7a9-40c9-9fd0-45d3257dec82

Did you create a minimal image yourselves?

To my knowledge, no.

zonca commented 1 month ago

thanks @julienchastang, can you test 0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69? does it give the same error of the "GenKernel" image?

julienchastang commented 1 month ago

Yes, it does, but I think that is a red herring. I believe you cannot refer to image IDs in cluster.tfvars, only image names e.g.,

image = "Ubuntu2204Minimal"

Any image ID provided (e.g., bd850f29-e7a9-40c9-9fd0-45d3257dec82,0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69)as well as Ubuntu2204MinimalGenKernel will give that error. Basically, the error is a result of not being able to find the image.

julienchastang commented 1 month ago

BTW, turning Terraform debugging to verbose yields:

X-Openstack-Request-Id: req-ea15e2b5-b3a4-41b2-9896-544629bcc310: timestamp=2024-08-02T16:25:08.115Z
2024-08-02T16:25:08.115Z [INFO]  plugin.terraform-provider-openstack_v1.54.1: 2024/08/02 16:25:08 [DEBUG] OpenStack Response Body: {
  "first": "/v2/images?name=0fa9f3b4-d29f-4f68-a8a7-16bf44ffae69\u0026sort_dir=asc\u0026sort_key=name\u0026status=active",
  "images": [],
  "schema": "/v2/schemas/images"
}: timestamp=2024-08-02T16:25:08.115Z
2024-08-02T16:25:08.115Z [ERROR] plugin.terraform-provider-openstack_v1.54.1: Response contains error diagnostic: diagnostic_summary="Your query returned no results. Please change your search criteria and try again." tf_rpc=ReadDataSource
 diagnostic_severity=ERROR tf_data_source_type=openstack_images_image_v2 tf_proto_version=5.4 @caller=github.com/hashicorp/terraform-plugin-go@v0.19.0/tfprotov5/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_detail= tf_provi
der_addr=registry.terraform.io/terraform-provider-openstack/openstack tf_req_id=7f00d46b-bd55-cb99-e79b-6f3a08693a99 timestamp=2024-08-02T16:25:08.115Z

As far as I understand, TF searches for images by name not ID. Coupled with the fact that the images do not appear when doing a openstack image list. That's the problem, I believe.

zonca commented 1 month ago

ok @julienchastang so let's wait for the Jetstream team to answer my support ticket.

If this is urgent, you can create the image yourself:

julianpistorius commented 1 month ago

Hello @zonca & @julienchastang!

I think you figured the image visibility issue out between yourselves. OpenStack will not list community-visible images belonging to other projects unless you explicitly tell it to.

This is as designed:

You can test this by running the following two commands:

$ openstack image list --debug
...
Image client initialized using OpenStack SDK: <openstack.image.v2._proxy.Proxy object at 0x76099041a0a0>
REQ: curl -g -i -X GET https://js2.jetstream-cloud.org:9292/v2/images -H "Accept: application/json" ...
...
$ openstack image list --debug --community
...
Image client initialized using OpenStack SDK: <openstack.image.v2._proxy.Proxy object at 0x79928b5e40d0>
REQ: curl -g -i -X GET "https://js2.jetstream-cloud.org:9292/v2/images?visibility=community" -H "Accept: application/json" ...
...

The first command will return:

The second command will return all community-visible images.

This is a security feature of OpenStack. For a concrete example: Anybody in the Jetstream2 community can create an image called Featured-Ubuntu22 and set its visibility to 'community'. Imagine what would happen if OpenStack listed this image in the default image list. At best it would cause confusion.

We had to figure this out in Exosphere as well: https://gitlab.com/exosphere/exosphere/-/issues/800#note_1073477839

@zonca you can explicitly share the image with @julienchastang using the steps documented here: https://docs.jetstream-cloud.org/ui/cli/snapshot-image/#sharing-an-image

@julienchastang you'll have to accept the shared image in turn.

I hope this helps. I'll respond in the ticket as well.

zonca commented 1 month ago

thanks @julianpistorius, I'll make a pull request to the docs to clarify this.

@julienchastang please send me your project id so I can share the image with you.

zonca commented 1 month ago

@julianpistorius from my understanding from the docs, in https://wiki.openstack.org/wiki/Glance-v2-community-image-visibility-design Openstack allows all users to boot community images. Jetstream instead is more restrictive and makes community images the same as shared images. Right?

zonca commented 1 month ago

Started working on this, but need to understand better how it actually works: https://gitlab.com/jetstream-cloud/docs/-/merge_requests/101

julianpistorius commented 1 month ago

@julienchastang I think booting from community images should work. I'll have to confirm though.

julienchastang commented 1 month ago

@zonca to answer your earlier question it is EES220002.

zonca commented 1 month ago

@julienchastang ok, shared image 671f03f1-4d86-4363-b03c-f5d54818693a with you

julienchastang commented 1 month ago

Hi @zonca. I am not finding the image:

[openstack@e246e035f3a5 ~]$ openstack image show 671f03f1-4d86-4363-b03c-f5d54818693a
No Image found for 671f03f1-4d86-4363-b03c-f5d54818693a
[openstack@e246e035f3a5 ~]$ openstack image list --community | grep 671f03f1-4d86-4363-b03c-f5d54818693a
[openstack@e246e035f3a5 ~]$ openstack image list --shared
+--------------------------------------+-----------------+--------+
| ID                                   | Name            | Status |
+--------------------------------------+-----------------+--------+
| 8d5a9a1d-42bf-41eb-a7cb-87cf8a472766 | Container-Linux | active |
| 7c3a973c-e9b1-460c-b370-320969ca527d | centos7.iso     | active |
| 5ff1e12c-b9b3-4dbe-a32f-903fae7f55b1 | rocky8.iso      | active |
+--------------------------------------+-----------------+--------+

I think the command to share the image is:

openstack image set --project <project_id> <image_id>

Again, here is our project info:

$ openstack project list
+----------------------------------+-----------+
| ID                               | Name      |
+----------------------------------+-----------+
| 5d01fe81db3f46bfbb354843e5846084 | EES220002 |
+----------------------------------+-----------+

Thanks again.

zonca commented 1 month ago

I ran openstack image add project Ubuntu2204MinimalGenKernel EES220002 following the JS2 docs.

In fact:

openstack image member list Ubuntu2204MinimalGenKernel
+--------------------------------------+-----------+---------+
| Image ID                             | Member ID | Status  |
+--------------------------------------+-----------+---------+
| 671f03f1-4d86-4363-b03c-f5d54818693a | EES220002 | pending |
+--------------------------------------+-----------+---------+

so you should run openstack image set --accept <image UUID or NAME>

From https://docs.jetstream-cloud.org/ui/cli/snapshot-image/

julienchastang commented 1 month ago
$ openstack image set --accept  671f03f1-4d86-4363-b03c-f5d54818693a
No Image found for 671f03f1-4d86-4363-b03c-f5d54818693a
$ openstack image set --accept  Ubuntu2204MinimalGenKernel
No Image found for Ubuntu2204MinimalGenKernel
zonca commented 1 month ago

@julianpistorius it seems we can't share images

julianpistorius commented 1 month ago

This is puzzling. I can both launch an instance from a community image, as well as share and accept images.

Launching an instance from a community-visible image

In the first OpenStack project which owns the private image, make it community-visible:

$ source project1-openrc.sh
$ IMAGE_ID=5b30084f-070b-4a03-95a5-19494fd180b1
$ openstack image set --community $IMAGE_ID
$ openstack image show --fit-width $IMAGE_ID
# Confirm it's set to community visible

In the second project, which is not a member of the image:

$ source project2-openrc.sh
$ IMAGE_ID=5b30084f-070b-4a03-95a5-19494fd180b1
$ openstack image show --fit-width $IMAGE_ID
# Confirm I can see image
$ openstack server create --os-compute-api-version 2.37 --flavor m3.small --image $IMAGE_ID --network auto_allocated_network --key-name=my-ssh-key --security-group default community-jammy-server
# ... launches a server

I did find that I had to use the image ID when creating the server, and using the image name did not work.

Sharing an image

In first project, which owns the private image:

$ IMAGE_ID=b1119fd1-e735-41d0-9766-ec8d33e9be65
$ openstack image show $IMAGE_ID
# Confirm it's private

In second project, which you want to share the image with:

$ IMAGE_ID=b1119fd1-e735-41d0-9766-ec8d33e9be65
$ openstack image show --fit-width $IMAGE_ID
No Image found for b1119fd1-e735-41d0-9766-ec8d33e9be65

Back in the first project:

$ openstack image set --shared $IMAGE_ID
$ OTHER_PROJECT_ID=<uuid-of-second-project>
$ openstack image add project $IMAGE_ID $OTHER_PROJECT_ID
+------------+--------------------------------------+
| Field      | Value                                |
+------------+--------------------------------------+
| created_at | 2024-08-08T21:46:18Z                 |
| image_id   | b1119fd1-e735-41d0-9766-ec8d33e9be65 |
| member_id  | <uuid-of-second-project>             |
| schema     | /v2/schemas/member                   |
| status     | pending                              |
| updated_at | 2024-08-08T21:46:18Z                 |
+------------+--------------------------------------+
$ openstack image member list --fit-width $IMAGE_ID
+--------------------------------------+-------------- ------------+---------+
| Image ID                             | Member ID                 | Status  |
+--------------------------------------+---------------------------+---------+
| b1119fd1-e735-41d0-9766-ec8d33e9be65 | <uuid-of-second-project>  | pending |
+--------------------------------------+---------------------------+---------+

In second project:

$ openstack image show --fit-width $IMAGE_ID
# Now I can see the image
$ openstack image set --accept $IMAGE_ID

Back in first project:

$ openstack image member list --fit-width $IMAGE_ID
+--------------------------------------+-------------- ------------+----------+
| Image ID                             | Member ID                 | Status   |
+--------------------------------------+---------------------------+----------+
| b1119fd1-e735-41d0-9766-ec8d33e9be65 | <uuid-of-second-project>  | accepted |
+--------------------------------------+---------------------------+----------+

Maybe we need to set up a screen-sharing session to debug this?

zonca commented 1 month ago

@julianpistorius I confirm I can see your community image and I can launch an instance. can you please try to launch an instance with my community shared image?

IMAGE_ID=f8211779-dcff-4827-a899-f5b8ad738f8f
zonca commented 1 month ago

also @julianpistorius, about uuid-of-second-project, the docs say "Where project is the AAA000000 number of the allocation you want to share it with." https://docs.jetstream-cloud.org/ui/cli/snapshot-image/#sharing-an-image. Are you using the actual ID instead?

julianpistorius commented 1 month ago

also @julianpistorius, about uuid-of-second-project, the docs say "Where project is the AAA000000 number of the allocation you want to share it with." https://docs.jetstream-cloud.org/ui/cli/snapshot-image/#sharing-an-image. Are you using the actual ID instead?

I am using the UUID, yes.

julianpistorius commented 1 month ago

IMAGE_ID=f8211779-dcff-4827-a899-f5b8ad738f8f

Yes, I was able to launch an instance from this image.

$ IMAGE_ID=f8211779-dcff-4827-a899-f5b8ad738f8f
$ openstack image show --fit-width $IMAGE_ID
# Shows the image details, name is 'Ubuntu2204MinimalGenKernelImg' 
$ openstack server create --os-compute-api-version 2.37 --flavor m3.small --image $IMAGE_ID --network auto_allocated_network --key-name=my-ssh-key --security-group default community-Ubuntu2204MinimalGenKernelImg
# Launches...
$ SERVER_ID=<server-uuid>
$ openstack server show --fit-width $SERVER_ID
# Shows server detail...
$ openstack console log show $SERVER_ID
# Booted and showing login prompt...
zonca commented 1 month ago

ok thanks, @julienchastang or @ana-v-espinoza can you try repeating what Julian did and then try again with Terraform and paste here errors?

I am worried that the problem is that Terraform wants image names, but only Image IDs work for shared images.

julienchastang commented 1 month ago

I am able to successfully launch that image via the openstack CLI (similar to Julian's session). However, terraform yields

Warning: Deprecated Resource

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 677, in resource "openstack_compute_floatingip_associate_v2" "bastion":
 677: resource "openstack_compute_floatingip_associate_v2" "bastion" {

use openstack_networking_floatingip_associate_v2 resource instead

(and 5 more similar warnings elsewhere)

Error: Your query returned no results. Please change your search criteria and try again.

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 1, in data "openstack_images_image_v2" "vm_image":
   1: data "openstack_images_image_v2" "vm_image" {

Error: Your query returned no results. Please change your search criteria and try again.

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 7, in data "openstack_images_image_v2" "gfs_image":
   7: data "openstack_images_image_v2" "gfs_image" {

Error: Your query returned no results. Please change your search criteria and try again.

  on ../../contrib/terraform/openstack/modules/compute/main.tf line 13, in data "openstack_images_image_v2" "image_master":
  13: data "openstack_images_image_v2" "image_master" {
julianpistorius commented 1 month ago

@julienchastang how are you trying to identify the image? Can you paste your Terraform here?

I'm not a Terraform expert, but could you use something like:

# Data source for the image using image ID
data "openstack_images_image_v2" "vm_image" {
  visibility = "community"
  name        = "Ubuntu2204MinimalGenKernel"
  properties = {
    id = "f8211779-dcff-4827-a899-f5b8ad738f8f"
  }
}

I haven't tried it, but the docs are suggestive: https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/data-sources/images_image_v2

julienchastang commented 1 month ago

Thanks @zonca. I ended up snapshotting my own image as described earlier. That should work for now. Thanks again.

zonca commented 1 month ago

@julienchastang I think I found a solution, there is a different keyword image_uuid and image_master_uuid

https://github.com/zonca/jetstream_kubespray/blob/6ba36564107c34a6261f2ae44d05aa344277e10e/contrib/terraform/openstack/modules/compute/variables.tf#L204-L218

If you have the opportunity to test this good, otherwise I'll test it later and add some comments in the tutorial files to notify of this possibility.

zonca commented 3 weeks ago

I tested an it worked fine, I updated the default cluster.tfvars for future reference

zonca commented 3 weeks ago

Ubuntu Minimal NFS and shared images