Open a-denis opened 4 months ago
Link-Local subnet prefix (0xFE8::/64) should only be supported by the spec. See section 4.1.1 GID USAGE AND PROPERTIES Vol 1 Release 1.7
Site-local subnet prefix (0xFEC::6) should be used in described use case.
From: Alexandre DENIS @.> Sent: Tuesday, May 21, 2024 4:13 PM To: linux-rdma/opensm @.> Cc: Subscribed @.***> Subject: [linux-rdma/opensm] subnet_prefix not honored with recent kernels (Issue #36)
Hi,
I noticed that with recent kernels, the subnet_prefix has no effect. In my opensm.conf, I have the following line: subnet_prefix 0xfe80000000000006 (with the 6 and the end)
When I boot with the old Linux 4.19, it behaves as expected:
ibv_devinfo -v | grep GID
GID[ 0]: fe80:0000:0000:0006:b859:9f03:00db:f884
On the same machine, without touching anything else, if I boot with kernel 6.1 or 6.8, it is wrong:
ibv_devinfo -v | grep GID
GID[ 0]: fe80:0000:0000:0000:b859:9f03:00db:f884
even though in the log, it seems to have been taken into account:
OpenSM 3.3.23
Reading Cached Option File: /etc/opensm/opensm.conf
Loading Cached Option:subnet_prefix = 0xfe80000000000006
I do not know which kernel version broke it.
Thank you.
— Reply to this email directly, view it on GitHubhttps://github.com/linux-rdma/opensm/issues/36, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD6PRHEAV24JWZRYNY6CRL3ZDNB5BAVCNFSM6AAAAABIBSX2K6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDQMRYGI4TONA. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>
I just tried with subnet_prefix 0xfec0000000000006
, but the issue is the same, it's still 0xfe80000000000000.
Did you restart subnet manager after changing subnet prefix in opensm.conf?
From: Alexandre DENIS @.> Sent: Tuesday, May 21, 2024 4:48 PM To: linux-rdma/opensm @.> Cc: Vladimir Koushnir @.>; Comment @.> Subject: Re: [linux-rdma/opensm] subnet_prefix not honored with recent kernels (Issue #36)
I just tried with subnet_prefix 0xfec0000000000006, but the issue is the same, it's still 0xfe80000000000000.
— Reply to this email directly, view it on GitHubhttps://github.com/linux-rdma/opensm/issues/36#issuecomment-2122679949, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD6PRHGAOEUU5GMIJY47BC3ZDNF7VAVCNFSM6AAAAABIBSX2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGY3TSOJUHE. You are receiving this because you commented.Message ID: @.**@.>>
Yes, I made the change on both hosts, and restarted opensm on both. subnet_prefix still has no effect with kernel 6.1 (Debian package)
I forgot to mention: there is no switch involved. Nodes are connected back to back.
Some more precisions: on another pair of machine with old ConnectX-3 boards, subnet_prefix works even with kernel 6.1. The machines where it does not work are using ConnectX-4 boards. This might be of some interest.
The issue seems nothing to do with opensm. Please refer to the relevant kernel forum.
From: Alexandre DENIS @.> Sent: Thursday, June 20, 2024 6:17 PM To: linux-rdma/opensm @.> Cc: Vladimir Koushnir @.>; Comment @.> Subject: Re: [linux-rdma/opensm] subnet_prefix not honored with recent kernels (Issue #36)
Some more precisions: on another pair of machine with old ConnectX-3 boards, subnet_prefix works even with kernel 6.1. The machines where it does not work are using ConnectX-4 boards. This might be of some interest.
— Reply to this email directly, view it on GitHubhttps://github.com/linux-rdma/opensm/issues/36#issuecomment-2180957021, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD6PRHE4I43GSC33NUSVUATZILW5LAVCNFSM6AAAAABIBSX2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQHE2TOMBSGE. You are receiving this because you commented.Message ID: @.**@.>>
Hi,
I noticed that with recent kernels, the
subnet_prefix
has no effect. In myopensm.conf
, I have the following line:subnet_prefix 0xfe80000000000006
(with the 6 and the end)When I boot with the old Linux 4.19, it behaves as expected:
On the same machine, without touching anything else, if I boot with kernel 6.1 or 6.8, it is wrong:
even though in the log, it seems to have been taken into account:
I do not know which kernel version broke it.
Thank you.