xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
356 stars 170 forks source link

genesis base rpm file for x86_64 not updated after 2.14.5 version #7404

Closed abhishek-sa1 closed 10 months ago

abhishek-sa1 commented 10 months ago

Latest NIC card drivers like ice which is required for intel E810 are not present for existing x86_64 genesis base rpm file.

Last update for x86_64 genesis base happened for 2.14.5 version.

I have found solution by updating xcat-cmdline file for loading the drivers as mentioned below.

elif [[ ${ARCH} =~ x86_64 ]]; then
    # load all network driver modules listed in /lib/modules/<kernel>/modules.dep file
    KERVER=`uname -r`
    for line in `cat /lib/modules/$KERVER/modules.dep |grep -vE 'tunnel|ieee|ifb|bond|dummy|fjes|hv_netvsc|ntb_netdev|xen-netfront|hdlc_fr|dlci'| awk -F: '{print \$1}' | sed -e "s/\(.*\)\.ko.*/\1/"`; do
        if [[ $line =~ "kernel/drivers/net" ]]; then
            modprobe `basename $line`
        fi
    done
fi

Can I raise PR for this?

Is there any plan for creating rpm file xCAT genesis base for x86_64 with latest changes or should we create the genesis base rpm file locally and use it by carrying the rpm file?

@samveen @gurevichmark

samveen commented 10 months ago
abhishek-sa1 commented 10 months ago

@samveen It is required to load other drivers also using modprobe as we have to support multiple NIC interfaces available. Thats the reason added loop for all drivers in modules.dep file for x86_64 architecture. Similar loop present for ppc64 architecture as well. The drivers which are not required is excluded by added grep -vE option as mentioned in the script.

Can I create PR for this change and is it possible to create a new xCAT genesis base rpm file in next release and keep in the tarball?

samveen commented 10 months ago

@abhishek-sa1 what I mean is that that on x86_64 platform the command that you run the give a lot more output than just network drivers:

cat /lib/modules/$KERVER/modules.dep |grep -vE 'tunnel|ieee|ifb|bond|dummy|fjes|hv_netvsc|ntb_netdev|xen-netfront|hdlc_fr|dlci'| awk -F: '{print \$1}'

The exception list list on x86_64 will need to be much greater than just these.

abhishek-sa1 commented 10 months ago

@samveen Without adding modprobe for all drivers in x86_64, custom genesis base image ss getting stuck for other Intel and Broadcom NIC cards as mentioned below: image

Initially I had same block like it is present in ppc64. For ppc64 also we are loading all drivers in module.dep file. Is there any reason we are loading all drivers in ppc64? https://github.com/xcat2/xcat-core/blob/bb7a4bbbc8bde7e6613558d8d039fe43d49d2079/xCAT-genesis-builder/xcat-cmdline.sh#L56

elif [[ ${ARCH} =~ x86_64 ]]; then
    # load all network driver modules listed in /lib/modules/<kernel>/modules.dep file
    KERVER=`uname -r`
    for line in `cat /lib/modules/$KERVER/modules.dep | awk -F: '{print \$1}' | sed -e "s/\(.*\)\.ko.*/\1/"`; do
        if [[ $line =~ "kernel/drivers/net" ]]; then
            modprobe `basename $line`
        fi
    done
fi

Reason for adding tunnel|ieee|ifb|bond|dummy in exception list is due to unwanted NICs which are getting loaded in genesis image as mentioned below. image

Reason for adding fjes|hv_netvsc|ntb_netdev|xen-netfront|hdlc_fr|dlci is due to modprobe error visible on the screen while booting genesis image. image

samveen commented 10 months ago

@abhishek-sa1 Filtering for kernel/drivers/net and adding line count at the end of the grep as below:

cat /lib/modules/$KERVER/modules.dep |grep -vE 'tunnel|ieee|ifb|bond|dummy|fjes|hv_netvsc|ntb_netdev|xen-netfront|hdlc_fr|dlci'| awk -F: '{print $1}' |grep 'kernel/drivers/net'| wc -l

I get 236 modules on a Centos 7 system: that is 236 modules getting loaded. I expect you will get a similar number too. All this is not required. (for example WiFi or USB networking drivers).

Instead of loading drivers uselessly, build a custom genesis image for your environment , after loading only those modules that your environment needs at : https://github.com/xcat2/xcat-core/blob/bb7a4bbbc8bde7e6613558d8d039fe43d49d2079/xCAT-genesis-builder/xcat-cmdline.sh#L79

Just simple modprobe ice should be enough. If not, add the other required modules to modprobe as well, instead of trying to load all modules as below:

for mod in  "ice" "and" "other" "modules" ; do
            modprobe $mod
done
abhishek-sa1 commented 10 months ago

@samveen in ppc64 architecture we are loading all 252 modules. We need all drivers in ppc64?

https://github.com/xcat2/xcat-core/blob/bb7a4bbbc8bde7e6613558d8d039fe43d49d2079/xCAT-genesis-builder/xcat-cmdline.sh#L53-L60

abhishek-sa1 commented 10 months ago

@samveen we are using RHEL8 for building rpm. Not CentOS7.

samveen commented 10 months ago

@abhishek-sa1 did you run the test on RHEL8? what was the result of the test?

abhishek-sa1 commented 10 months ago

@samveen I had run buildrpm command to create genesis base rpm for x86_64 in RHEL8 and it was successful. I had removed existing genesis base rpm for x86_64 and installed new one which I created. I was able to boot genesis image in RHEL8.6. I could disover all nodes in bmc discovery method using this genesis image.

samveen commented 10 months ago

@abhishek-sa1 please run the following command on rhel8 and give me the output:

cat /lib/modules/$(uname -r)/modules.dep |grep -vE 'tunnel|ieee|ifb|bond|dummy|fjes|hv_netvsc|ntb_netdev|xen-netfront|hdlc_fr|dlci'| awk -F: '{print $1}' |grep 'kernel/drivers/net'| wc -l

abhishek-sa1 commented 10 months ago

@samveen

[root@springcp ~]# cat /lib/modules/$(uname -r)/modules.dep |grep -vE 'tunnel|ieee|ifb|bond|dummy|fjes|hv_netvsc|ntb_netdev|xen-netfront|hdlc_fr|dlci'| awk -F: '{print $1}' |grep 'kernel/drivers/net'| wc -l
249
samveen commented 10 months ago

@abhishek-sa1 As you can see, loading 249 modules will be overkill, not to mention take up kernel memory needlessly.

That is why I proposed that you add just the modules you require rather than try and pick from the kernel module dependency list as below:

for mod in  "ice" "and" "other" "modules" ; do
            modprobe $mod
done
abhishek-sa1 commented 10 months ago

@samveen Thanks for the information.