Closed jsquyres closed 9 years ago
The "util/arch.h" include is missing.
On Thu, Feb 19, 2015 at 8:52 AM, Jeff Squyres notifications@github.com wrote:
As reported by @siegmargross https://github.com/siegmargross in http://www.open-mpi.org/community/lists/users/2015/02/26343.php, --enable-heterogeneous builds on master are broken.
Here's the first error I run into on Linux (Siegmar first runs into problems in pml/cm):
CC btl_openib_put.lo btl_openib_put.c: In function 'mca_btl_openib_put': btl_openib_put.c:87:27: error: dereferencing pointer to incomplete type if ((ep->endpoint_proc->proc_opal->proc_arch & OPAL_ARCH_ISBIGENDIAN) ^ btl_openib_put.c:87:52: error: 'OPAL_ARCH_ISBIGENDIAN' undeclared (first use in this function) if ((ep->endpoint_proc->proc_opal->proc_arch & OPAL_ARCH_ISBIGENDIAN) ^ btl_openib_put.c:87:52: note: each undeclared identifier is reported only once for each function it appears in
— Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/403.
I'm quite sure Mellanox doesn't care about openib any more, but I'll ping @jladd-mlnx and @miked-mellanox just to make sure they're aware.
@hjelmn @hppritcha I think you guys are shepherding the openib BTL these days -- can you have a look?
i fix the issue related to btl/openib in open-mpi/ompi@f43b5b46ee745e27ead0e0cbe95fff2a2f5c1c9f
there is an other issue with pml/ob1 i am working on FWIW, here is where i am :
diff --git a/ompi/mca/pml/ob1/pml_ob1.c b/ompi/mca/pml/ob1/pml_ob1.c
index c7eaef5..701b46c 100644
--- a/ompi/mca/pml/ob1/pml_ob1.c
+++ b/ompi/mca/pml/ob1/pml_ob1.c
@@ -16,7 +16,9 @@
* Copyright (c) 2011 Sandia National Laboratories. All rights reserved.
* Copyright (c) 2011-2015 Los Alamos National Security, LLC. All rights
* reserved.
- * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved.
+ * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved.
+ * Copyright (c) 2015 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -661,7 +663,8 @@ int mca_pml_ob1_send_fin( ompi_proc_t* proc,
mca_pml_ob1_fin_hdr_prepare ((mca_pml_ob1_fin_hdr_t *) fin->des_segments->seg_addr.pval,
0, hdr_frag.lval, status ? status : (int64_t) rdma_size);
- ob1_hdr_hton(hdr, MCA_PML_OB1_HDR_TYPE_FIN, proc);
+ /* FIXME it compiles ... but does it work ? */
+ ob1_hdr_hton((mca_pml_ob1_fin_hdr_t *)fin, MCA_PML_OB1_HDR_TYPE_FIN, proc);
/* queue request */
rc = mca_bml_base_send( bml_btl, fin, MCA_PML_OB1_HDR_TYPE_FIN );
diff --git a/ompi/mca/pml/ob1/pml_ob1_hdr.h b/ompi/mca/pml/ob1/pml_ob1_hdr.h
index e53f4af..9d7d90f 100644
--- a/ompi/mca/pml/ob1/pml_ob1_hdr.h
+++ b/ompi/mca/pml/ob1/pml_ob1_hdr.h
@@ -13,6 +13,8 @@
* Copyright (c) 2009 IBM Corporation. All rights reserved.
* Copyright (c) 2012-2014 Los Alamos National Security, LLC. All rights
* reserved.
+ * Copyright (c) 2015 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -198,17 +200,21 @@ static inline void mca_pml_ob1_rget_hdr_prepare (mca_pml_ob1_rget_hdr_t *hdr, ui
memcpy (hdr + 1, local_handle, local_handle_size);
}
+/* FIXME
+ * need to convert (h).hdr_frag
+ * can we simply assume this is a uint64_t ?
+ */
#define MCA_PML_OB1_RGET_HDR_NTOH(h) \
do { \
MCA_PML_OB1_RNDV_HDR_NTOH((h).hdr_rndv); \
- (h).hdr_seg_cnt = ntohl((h).hdr_seg_cnt); \
+ (h).hdr_frag.lval = ntoh64((h).hdr_frag.lval); \
(h).hdr_src_ptr = ntoh64((h).hdr_src_ptr); \
} while (0)
#define MCA_PML_OB1_RGET_HDR_HTON(h) \
do { \
MCA_PML_OB1_RNDV_HDR_HTON((h).hdr_rndv); \
- (h).hdr_seg_cnt = htonl((h).hdr_seg_cnt); \
+ (h).hdr_frag.lval = hton64((h).hdr_frag.lval); \
(h).hdr_src_ptr = hton64((h).hdr_src_ptr); \
} while (0)
@@ -351,10 +357,16 @@ static inline void mca_pml_ob1_rdma_hdr_prepare (mca_pml_ob1_rdma_hdr_t *hdr, ui
memcpy (hdr + 1, local_handle, local_handle_size);
}
+/* FIXME
+ * need to convert (h).hdr_req, (h).hdr_frag and (h).hdr_recv_req
+ * can we simply assume these are uint64_t ?
+ */
#define MCA_PML_OB1_RDMA_HDR_NTOH(h) \
do { \
MCA_PML_OB1_COMMON_HDR_NTOH((h).hdr_common); \
- (h).hdr_seg_cnt = ntohl((h).hdr_seg_cnt); \
+ (h).hdr_req.lval = ntoh64((h).hdr_req.lval); \
+ (h).hdr_frag.lval = ntoh64((h).hdr_frag.lval); \
+ (h).hdr_recv_req.lval = ntoh64((h).hdr_recv_req.lval); \
(h).hdr_rdma_offset = ntoh64((h).hdr_rdma_offset); \
(h).hdr_dst_ptr = ntoh64((h).hdr_dst_ptr); \
(h).hdr_dst_size = ntoh64((h).hdr_dst_size); \
@@ -363,7 +375,9 @@ static inline void mca_pml_ob1_rdma_hdr_prepare (mca_pml_ob1_rdma_hdr_t *hdr, ui
#define MCA_PML_OB1_RDMA_HDR_HTON(h) \
do { \
MCA_PML_OB1_COMMON_HDR_HTON((h).hdr_common); \
- (h).hdr_seg_cnt = htonl((h).hdr_seg_cnt); \
+ (h).hdr_req.lval = hton64((h).hdr_req.lval); \
+ (h).hdr_frag.lval = hton64((h).hdr_frag.lval); \
+ (h).hdr_recv_req.lval = hton64((h).hdr_recv_req.lval); \
(h).hdr_rdma_offset = hton64((h).hdr_rdma_offset); \
(h).hdr_dst_ptr = hton64((h).hdr_dst_ptr); \
(h).hdr_dst_size = hton64((h).hdr_dst_size); \
@ggouaillardet Looks good to me so far
@ggouaillardet https://github.com/ggouaillardet The change in pml_ob1.c seems wrong to me. The header that is prepared is ((mca_pml_ob1_fin_hdr_t ) fin->des_segments->seg_addr.pval) and not (mca_pml_ob1_fin_hdr_t )fin. "fin" is a mca_btl_base_descriptor_t which has nothing to do with a header.
On Wed, Mar 4, 2015 at 12:09 PM, Nathan Hjelm notifications@github.com wrote:
@ggouaillardet https://github.com/ggouaillardet Looks good to me so far
— Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/403#issuecomment-77199995.
@bosilca Ah, yes. Good catch. That line is clearly wrong.
@ggouaillardet You are byte-swapping some stuff that shouldn't be byte-swapped. I have a fix ready to go that I can commit.
ob1 fixed in d251fa15253a0270b54bfeeb8258e695e38bd65b
Can you please also remote it from the comment?
Thanks, George.
On Apr 20, 2015, at 11:27 , Nathan Hjelm notifications@github.com wrote:
ob1 fixed in d251fa1 https://github.com/open-mpi/ompi/commit/d251fa15253a0270b54bfeeb8258e695e38bd65b — Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/403#issuecomment-94483576.
@siegmargross Can you give master a whirl and see if the problem is fixed for you now? Thanks.
This issue is now likely a (at least effectively) a duplicate of #540. So we might as well close one of them -- I'm arbitrarily choosing to close the older one, because --enable-heterogeneous
builds again on the master; we just need a confirmation that it actually works (which can be done on #540).
As reported by @siegmargross in http://www.open-mpi.org/community/lists/users/2015/02/26343.php,
--enable-heterogeneous
builds on master are broken.Here's the first error I run into on Linux (Siegmar first runs into problems in pml/cm):