warewulf / warewulf3

Warewulf is a scalable systems management suite originally developed to manage large high-performance Linux clusters.
107 stars 45 forks source link

Did warewulf really work? #235

Closed everett-moon closed 4 years ago

everett-moon commented 4 years ago

I have tried to deploy OpenHPC version 1.3.8 via offical recipe.sh script and failed as expected!

image

I am sorry, and I really need some instructions

jmstover commented 4 years ago

On the provisioner, what's the output of:

wwsh node list

Pretty sure with the OpenHPC setup, the provisioner is the head node.

Also, what are you running on? CentOS 7?

everett-moon commented 4 years ago

@jmstover ,Yes, I am running on CentOS 7.6.

[root@sms home]# wwsh node list
NAME                GROUPS              IPADDR              HWADDR
================================================================================
c1                  UNDEF               192.168.149.253     08:00:27:c5:6f:8b
c2                  UNDEF               192.168.149.252     08:00:27:c5:6f:8c

[root@sms home]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:0a:74:96 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.172/24 brd 192.168.10.255 scope global noprefixroute dynamic enp0s3
       valid_lft 1320sec preferred_lft 1320sec
    inet6 fe80::92c3:1a9f:36e8:c912/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:50:6a:7f brd ff:ff:ff:ff:ff:ff
    inet 192.168.149.254/24 brd 192.168.149.255 scope global enp0s8
       valid_lft forever preferred_lft forever
[root@sms home]# wwsh file print network
#### network ##################################################################
network         : ID               = 8
network         : NAME             = network
network         : PATH             = /etc/sysconfig/network
network         : ORIGIN           = /tmp/network.3850
network         : FORMAT           = data
network         : CHECKSUM         = a67c18537091f5398e5ed526df022aae
network         : INTERPRETER      = UNDEF
network         : SIZE             = 18
network         : MODE             = 0644
network         : UID              = 0
network         : GID              = 0
[root@sms home]# cat /tmp/network.3850
GATEWAYDEV=enp0s3

enp0s3 is the public device, and enp0s8 is the internal private network device. Am I wrong to set the variable eth_provisioin to enp0s3?

And I have noticed that the dhcpd.service conplained that DHCPDISCOVER from 80:00:27:c5:6f:8b via enp0s8: network 192.168.149.0/24: no free leases, Should I add range manually?

[root@sms home]# cat /etc/dhcp/dhcpd.conf
# DHCPD Configuration written by Warewulf. Do not edit this file, rather
# edit the template: /etc/warewulf/dhcpd-template.conf

allow booting;
allow bootp;
ddns-update-style interim;
authoritative;

option space ipxe;

# Tell iPXE to not wait for ProxyDHCP requests to speed up boot.
option ipxe.no-pxedhcp code 176 = unsigned integer 8;
option ipxe.no-pxedhcp 1;

option architecture-type   code 93  = unsigned integer 16;

if exists user-class and option user-class = "iPXE" {
    filename "http://192.168.149.254/WW/ipxe/cfg/${mac}";
} else {
    if option architecture-type = 00:0B {
        filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";
    } elsif option architecture-type = 00:0A {
        filename "/warewulf/ipxe/bin-arm32-efi/placeholder.efi";
    } elsif option architecture-type = 00:09 {
        filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
    } elsif option architecture-type = 00:07 {
        filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
    } elsif option architecture-type = 00:06 {
        filename "/warewulf/ipxe/bin-i386-efi/snp.efi";
    } elsif option architecture-type = 00:00 {
        filename "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe";
    }
}

subnet 192.168.149.0 netmask 255.255.255.0 {
   not authoritative;
   # option interface-mtu 9000;
   option subnet-mask 255.255.255.0;
}

# Node entries will follow below

group {
   # Evaluating Warewulf node: c1 (DB ID:9)
   # Adding host entry for c1-enp0s3
   host c1-enp0s3 {
      option host-name c1;
      hardware ethernet 08:00:27:c5:6f:8b;
      fixed-address 192.168.149.253;
      next-server 192.168.149.254;
   }
   # Evaluating Warewulf node: c2 (DB ID:11)
   # Adding host entry for c2-enp0s3
   host c2-enp0s3 {
      option host-name c2;
      hardware ethernet 08:00:27:c5:6f:8c;
      fixed-address 192.168.149.252;
      next-server 192.168.149.254;
   }
}
everett-moon commented 4 years ago

How should I setup the local network before deploying openhpc? I'm quite a new one for dhcp

everett-moon commented 4 years ago

I think I have found the problem. sorry!

jmstover commented 4 years ago

Sounds good, but just for completeness here...

The configuration looks correct from a quick scan, so...

How should I setup the local network before deploying openhpc? I'm quite a new one for dhcp

As you have it setup looks correct. Usually a public and cluster network device. I'll usually setup DHCP to listen only on the cluster device. You do this by adding the device at the end of the startup (SystemD unit file, /etc/sysconfig/dhcp, etc...).

Make sure dhcpd is up and running. That's something like:

systemctl enable dhcpd
systemctl start dhcpd

If you're running into weird booting issues, you can try running:

wwsh pxe update
wwsh dhcp update

That'll force a rebuild of the PXE and DHCP config files. You usually don't need to worry about that as it should be done anytime a node object is modified.

Re-open this if you're still having some issues.

everett-moon commented 4 years ago

Everything works on virtualbox, but does not work on my server machines using a FlexBoot.

[root@localhost home]# systemctl status dhcpd
● dhcpd.service - DHCPv4 Server Daemon
   Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-04-08 10:38:43 CST; 3min 29s ago
     Docs: man:dhcpd(8)
           man:dhcpd.conf(5)
 Main PID: 73011 (dhcpd)
   Status: "Dispatching packets..."
   CGroup: /system.slice/dhcpd.service
           └─73011 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid

Apr 08 10:41:15 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:23 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:31 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:40 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:48 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:57 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:58 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:41:59 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:42:02 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
Apr 08 10:42:06 sms dhcpd[73011]: DHCPDISCOVER from bc:a9:20:87:eb:ba via eno1: network 192.168.50.0/24: no free leases
You have new mail in /var/spool/mail/root
everett-moon commented 4 years ago

The iPXE version is 1.0.0+. I am using Conect-X3 .