linux-rdma / opensm

Other
67 stars 36 forks source link

subnet_prefix not honored with recent kernels #36

Open a-denis opened 4 months ago

a-denis commented 4 months ago

Hi,

I noticed that with recent kernels, the subnet_prefix has no effect. In my opensm.conf, I have the following line: subnet_prefix 0xfe80000000000006 (with the 6 and the end)

When I boot with the old Linux 4.19, it behaves as expected:

 ibv_devinfo -v | grep GID
            GID[  0]:       fe80:0000:0000:0006:b859:9f03:00db:f884

On the same machine, without touching anything else, if I boot with kernel 6.1 or 6.8, it is wrong:

ibv_devinfo -v | grep GID
            GID[  0]:       fe80:0000:0000:0000:b859:9f03:00db:f884

even though in the log, it seems to have been taken into account:

OpenSM 3.3.23
 Reading Cached Option File: /etc/opensm/opensm.conf
 Loading Cached Option:subnet_prefix = 0xfe80000000000006

I do not know which kernel version broke it.

Thank you.

vladko1974 commented 4 months ago

Link-Local subnet prefix (0xFE8::/64) should only be supported by the spec. See section 4.1.1 GID USAGE AND PROPERTIES Vol 1 Release 1.7

Site-local subnet prefix (0xFEC::6) should be used in described use case.

From: Alexandre DENIS @.> Sent: Tuesday, May 21, 2024 4:13 PM To: linux-rdma/opensm @.> Cc: Subscribed @.***> Subject: [linux-rdma/opensm] subnet_prefix not honored with recent kernels (Issue #36)

Hi,

I noticed that with recent kernels, the subnet_prefix has no effect. In my opensm.conf, I have the following line: subnet_prefix 0xfe80000000000006 (with the 6 and the end)

When I boot with the old Linux 4.19, it behaves as expected:

ibv_devinfo -v | grep GID

                   GID[  0]:              fe80:0000:0000:0006:b859:9f03:00db:f884

On the same machine, without touching anything else, if I boot with kernel 6.1 or 6.8, it is wrong:

ibv_devinfo -v | grep GID

           GID[  0]:              fe80:0000:0000:0000:b859:9f03:00db:f884

even though in the log, it seems to have been taken into account:

OpenSM 3.3.23

Reading Cached Option File: /etc/opensm/opensm.conf

Loading Cached Option:subnet_prefix = 0xfe80000000000006

I do not know which kernel version broke it.

Thank you.

— Reply to this email directly, view it on GitHubhttps://github.com/linux-rdma/opensm/issues/36, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD6PRHEAV24JWZRYNY6CRL3ZDNB5BAVCNFSM6AAAAABIBSX2K6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDQMRYGI4TONA. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

a-denis commented 4 months ago

I just tried with subnet_prefix 0xfec0000000000006, but the issue is the same, it's still 0xfe80000000000000.

vladko1974 commented 4 months ago

Did you restart subnet manager after changing subnet prefix in opensm.conf?

From: Alexandre DENIS @.> Sent: Tuesday, May 21, 2024 4:48 PM To: linux-rdma/opensm @.> Cc: Vladimir Koushnir @.>; Comment @.> Subject: Re: [linux-rdma/opensm] subnet_prefix not honored with recent kernels (Issue #36)

I just tried with subnet_prefix 0xfec0000000000006, but the issue is the same, it's still 0xfe80000000000000.

— Reply to this email directly, view it on GitHubhttps://github.com/linux-rdma/opensm/issues/36#issuecomment-2122679949, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD6PRHGAOEUU5GMIJY47BC3ZDNF7VAVCNFSM6AAAAABIBSX2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGY3TSOJUHE. You are receiving this because you commented.Message ID: @.**@.>>

a-denis commented 3 months ago

Yes, I made the change on both hosts, and restarted opensm on both. subnet_prefix still has no effect with kernel 6.1 (Debian package)

I forgot to mention: there is no switch involved. Nodes are connected back to back.

a-denis commented 3 months ago

Some more precisions: on another pair of machine with old ConnectX-3 boards, subnet_prefix works even with kernel 6.1. The machines where it does not work are using ConnectX-4 boards. This might be of some interest.

vladko1974 commented 3 months ago

The issue seems nothing to do with opensm. Please refer to the relevant kernel forum.

From: Alexandre DENIS @.> Sent: Thursday, June 20, 2024 6:17 PM To: linux-rdma/opensm @.> Cc: Vladimir Koushnir @.>; Comment @.> Subject: Re: [linux-rdma/opensm] subnet_prefix not honored with recent kernels (Issue #36)

Some more precisions: on another pair of machine with old ConnectX-3 boards, subnet_prefix works even with kernel 6.1. The machines where it does not work are using ConnectX-4 boards. This might be of some interest.

— Reply to this email directly, view it on GitHubhttps://github.com/linux-rdma/opensm/issues/36#issuecomment-2180957021, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD6PRHE4I43GSC33NUSVUATZILW5LAVCNFSM6AAAAABIBSX2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQHE2TOMBSGE. You are receiving this because you commented.Message ID: @.**@.>>