Closed hiyijian closed 6 years ago
It seems that IPAM allocate IP address failed
Yes. it is wield since I set ipam to "fixipam" but it seems that sriov cni use "calico-ipam" instead. Is there any relation bettween them?
root@root0-PR4768GW-238:/etc/cni/net.d# ls
10-calico.conf 10-rdmanet.conf bak calico-kubeconfig calico-tls
root@root0-PR4768GW-238:/etc/cni/net.d# cat 10-rdmanet.conf
{
"name": "rdmanet",
"type": "sriov",
"master": "ib0",
"pfOnly": false,
"ipam": {
"type": "fixipam",
"subnet": "10.55.206.0/26",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"gateway": "10.55.206.1"
}
}
roo
When I stop calico service and remove 10.calico.conf, I got some other error from sriov cni
root@root0-PR4768GW-238:/usr/local/go/src/github.com/hustcat/sriov-cni/scripts# CNI_PATH=$CNI_PATH CNI_ARGS="IgnoreUnknown=1;IP=10.55.206.46;VF=1;MAC=66:d8:02:77:aa:aa" ./priv-net-run.sh ifconfig
contid=523d2e0f6b4f34a4
netnspath=/var/run/netns/523d2e0f6b4f34a4
rdmanet : error executing ADD: failed to open the virtfn1 dir of the device "ib0": lstat /sys/class/net/ib0/device/virtfn1/net: no such file or directory
Maybe you should move 10-calico.conf
. @hiyijian
When I stop calico service and remove 10.calico.conf, I got some other error from sriov cni
root@root0-PR4768GW-238:/usr/local/go/src/github.com/hustcat/sriov-cni/scripts# CNI_PATH=$CNI_PATH CNI_ARGS="IgnoreUnknown=1;IP=10.55.206.46;VF=1;MAC=66:d8:02:77:aa:aa" ./priv-net-run.sh ifconfig
contid=523d2e0f6b4f34a4
netnspath=/var/run/netns/523d2e0f6b4f34a4
rdmanet : error executing ADD: failed to open the virtfn1 dir of the device "ib0": lstat /sys/class/net/ib0/device/virtfn1/net: no such file or directory
and below is /sys/class/net/ib0/device/virtfn1
jianyi@root0-PR4768GW-238:~$ ls /sys/class/net/ib0/device/virtfn1
broken_parity_status d3cold_allowed enable local_cpus physfn resource2 subsystem_device
class device firmware_node modalias power resource2_wc subsystem_vendor
config dma_mask_bits irq msi_bus reset revision uevent
consistent_dma_mask_bits driver_override local_cpulist numa_node resource subsystem vendor
@hustcat
It seems that it failed to enable virtual function, according to kernel message.
[ 7.052904] mlx4_core: device is working in RoCE mode: Roce V1
[ 7.052904] mlx4_core: UD QP Gid type is: V1
[ 8.727397] mlx4_core 0000:01:00.0: DMFS high rate steer mode is: default performance
[ 8.727603] mlx4_core 0000:01:00.0: Enabling SR-IOV with 63 VFs
[ 8.834138] pci 0000:01:00.1: [15b3:1004] type 00 class 0x028000
[ 8.840643] pci 0000:01:00.1: Max Payload Size set to 256 (was 128, max 512)
[ 8.843061] mlx4_core: Initializing 0000:01:00.1
[ 8.843112] mlx4_core 0000:01:00.1: enabling device (0000 -> 0002)
[ 8.843948] mlx4_core 0000:01:00.1: Skipping virtual function:1
[ 8.844517] pci 0000:01:00.2: [15b3:1004] type 00 class 0x028000
[ 8.851059] pci 0000:01:00.2: Max Payload Size set to 256 (was 128, max 512)
can you please to help?
@hiyijian Can you give more detailed log? From the begining which kernel load the mlx4_core module.
Ok. Thanks. see dmesg.txt here is a more detail doc for the issue: sriov VF enable failed.docx
@hiyijian Can you show me the module config file, such as /etc/modprobe.d/mlx4_core.conf
.
$ cat /etc/modprobe.d/mlx4_core.conf
options mlx4_core num_vfs=63 port_type_array=1,1 probe_vf=1
Please replace it with options mlx4_core port_type_array=2,2 num_vfs=0,4,0 probe_vf=0,4,0
and try again.
@hiyijian
Thanks @hustcat . I realized the problem is a little bit complex. The engineer of mellanox has already engaged to solve our problem. I will let you know when it done.
Hi stuff, I come cross following error when starting network. K8S + calico have been already runing on my cluster. Any Idea ? Appreciated!