linux-rdma / opensm

Other
66 stars 36 forks source link

Fix libvendor/osm_vendor_ibumad.c so clang -Werror does not complain #14

Closed hnrose closed 4 years ago

hnrose commented 5 years ago

Clang doesn't like getting pointers from packed struct members, even if aligned

Pointed-out-by: Nicolas Morey-Chaisemartin nmoreychaisemartin@suse.com

Signed-off-by: Hal Rosenstock hal@mellanox.com

hnrose commented 5 years ago

@nmorey Is libibumad from rdma-core being picked up ?

602osm_vendor_ibumad.c:746:56: error: incompatible pointer types passing '__be64 ' 603 (aka 'unsigned long long ') to parameter of type 'uint64_t ' 604 (aka 'unsigned long ') [-Werror,-Wincompatible-pointer-types] 605 ...if ((r = umad_get_ca_portguids(p_vend->ca_names[ca], &portguids[0], 606 ^~~~~ 607/usr/include/infiniband/umad.h:166:59: note: passing argument to parameter 608 'portguids' here 609int umad_get_ca_portguids(const char ca_name, uint64_t portguids, int max); 610 ^

This declaration of umad_get_ca_portguids looks to be from umad.h prior to rdma-core.

I also see clang complaints of packed member alignment issue with p_mad->trans_id

nmorey commented 5 years ago

@hnrose I'm guessing this is picking a debian version and not necessarily the latest.

hnrose commented 5 years ago

@nmorey libibumad appears to be picked up from the following: http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/universe amd64 libibumad3 amd64 1.3.10.2-1 [16.7 kB] which is the old (pre rdma-core) one. Any idea how to make it pick up libibumad from some released rdma-core ?

This is just a perceived incompatible pointer type by clang.

Main issue is clang not liking the alignment of ib_mad_t, specifically when going after the transaction ID.

nmorey commented 5 years ago

@hnrose: I checked and xenial is just using a very very old version of everything... This can be solved by either using a container within travis to build on a newer release Do a pre-build step that installs a specific rdma-core release.

I'll look into that

nmorey commented 5 years ago

@jgunthorpe Any idea on how to deal with this ? Could we publish the Xenial packages with the releases so they can be used by other github projects ? I'd rather avoid pulling all the cbuild stuff from rdma-core just for that.

Building rdma-core debian packages locally then installing them works but it's not very clean...

jgunthorpe commented 5 years ago

Best is to just not use travis, it is horrible for this kind of stuff :( I've been slowly working to replace travis for rdma-core, but haven't got it yet

Also, this patch looks kind of bonkers, foo and &foo[0] are the same thing... Not sure what 'packed' has to do with this

The reason you can't take the address of a packed member is because it is not aligned, it is simply an error and you shouldn't ever do it - it will crash at runtime on ARM. If the member is actually aligned then don't use packed, but use the proper attribute aligned to tell the compiler what is happening and it won't complain.

nmorey commented 5 years ago

Agreed that travis sucks. But it's easy enough to setup a minimal validation set.

But yes. The PACKED attribute should probably be dropped on most of these structs

hnrose commented 5 years ago

[jgunthorpe wrote:] Also, this patch looks kind of bonkers, foo and &foo[0] are the same thing... Not sure what 'packed' has to do with this

Yes, I know they're the same thing; the change from foo -> &foo[0] was just a test to see if clang would stop complaining about the incompatible pointer type.

[jgunthorpe wrote:] The reason you can't take the address of a packed member is because it is not aligned, it is simply an error and you shouldn't ever do it - it will crash at runtime on ARM. If the member is actually aligned then don't use packed, but use the proper attribute aligned to tell the compiler what is happening and it won't complain.

The structure packing in OpenSM has been there for long time and one needs to be very careful about undoing it. This is a bigger effort which should be done and I'll enter an issue for this.

I think I'm going to drop this specific patch.

jgunthorpe commented 5 years ago

Generally all MAD structures are aligned to 4 bytes, so what we did for srp_daemon/etc is to increase the alignment and use pahole & static_assert to validate the struct layout didn't change.

nmorey commented 5 years ago

I did a quick check with pahole. A lot fo struct just change size because they get 4 or 8B aligned which should not be an issue. But some get some internal padding between fields so we'll have deal carefully here

hnrose commented 5 years ago

@nmorey Most MAD attributes in IBA were spec'd to follow natural alignment but there are a small but significant number which do not. AFAIR NodeIndo is one of those because the GUIDs are not 64 bit aligned. There are others I've run across over time. Do you have a list of the ones which pahole found ?

I am tracking relevant comments for this in issue #15 - Remove structure packing where not needed

jgunthorpe commented 5 years ago

Generally the MADs have a natural alignment of 4 bytes and 64 bit values are only aligned to 4 bytes, not 8. This is why we ended up defining umad_gid as aligned(4) so it was compatible with MAD structures that have 4 byte GID alignment.

When we did srp_daemon it took some fussing with attributes and other adjustments to make the structs have the same layout with a higher alignment than packed. But pahole is reliable and if it says the struct has the same layout, then it does.