Closed mpichbot closed 7 years ago
Originally by blocksom on 2013-05-08 09:26:53 -0500
I pushed the commit to the mpich-ibm/ticket-1842
branch.
Originally by balaji on 2013-05-08 10:05:16 -0500
Why are you trying to restrict the maximum message size for the SMP collectives? Can you give some background on this?
Originally by archerc on 2013-05-08 11:07:47 -0500
At large message sizes on IB networks bandwidth of the SMP collectives is quite bad, sometimes 2x worse than non-smp. The cutoffs were put in place to obtain the best latency and bandwidth. This cutoff is tunable by environment variable and controlled by params.yml, so the defaults can be adjusted to maximum count if no cutoffs are desired for your underlying network.
Originally by balaji on 2013-05-09 21:56:47 -0500
I don't want to change the default value directly that way. But we do want to allow devices/channels customize the values as needed (that's one of the primary goals of MPICH). In fact, we should do this for all MPI-level parameters consistently, not just the subset provided in your patch.
I'd like to propose the following --
if (environment "MPIR_PARAM_FOOBAR" exists)
MPIR_PARAM_FOOBAR = atoi(ENVIRONMENT VARIABLE);
else {
#if defined MPID_PARAM_FOOBAR
MPIR_PARAM_FOOBAR = MPID_PARAM_FOOBAR;
#else
MPIR_PARAM_FOOBAR = <value>;
#endif /* MPID_PARAM_FOOBAR */
}
This will allow the device to override the default value as needed.
Does this approach make sense?
I'm on travel this week, but if we agree that this approach will work for everyone, I can try to hack this up this weekend.
Originally by blocksom on 2013-05-15 12:52:32 -0500
I rebased branch mpich-ibm/ticket-1842
to mpich/master
.
Originally by balaji on 2013-05-15 18:48:09 -0500
I didn't implement the first item above, since it didn't seem necessary after appropriate naming and segregation of different categories of parameters. The remaining patches have been pushed into mpich-ibm/ticket-1842-alt
. Please review and let me know, so I can push it into mpich/master
.
Originally by archerc on 2013-05-17 12:50:11 -0500
I have reviewed and agree with this change.
Originally by balaji on 2013-05-17 16:20:52 -0500
These changes have been pushed into mpich/master
.
Relevant commits: [cf31e37a], [cbcde2f4], [e279b1ae], [60a89bbd], [4456bbec], [430514fd], [c6084635], [fa950cb5].
Originally by archerc on 2013-05-20 10:50:59 -0500
In build testing of this patch, I discovered a chunk of glue code to the pami extension was not forward ported. Can we get this patched in for pamid?
Thanks
From 5e77b35d298b5136d93b2f4785c4a73379dfb6cc Mon Sep 17 00:00:00 2001
From: Charles Archer <archerc@us.ibm.com>
Date: Mon, 20 May 2013 06:56:58 -0400
Subject: [PATCH] PAMID device code for local task detection
---
src/mpid/pamid/src/mpidi_util.c | 51 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/src/mpid/pamid/src/mpidi_util.c b/src/mpid/pamid/src/mpidi_util.c
index ee8eda9..2bd89db 100644
--- a/src/mpid/pamid/src/mpidi_util.c
+++ b/src/mpid/pamid/src/mpidi_util.c
@@ -1908,3 +1908,54 @@ void MPIDI_collsel_pami_tune_cleanup()
}
+#undef FUNCNAME
+#define FUNCNAME MPID_Get_node_id
+#undef FCNAME
+#define FCNAME MPIDI_QUOTE(FUNCNAME)
+static int _g_max_node_id = -1;
+int MPID_Get_node_id(MPID_Comm *comm, int rank, MPID_Node_id_t *id_p)
+{
+ int mpi_errno = MPI_SUCCESS;
+ uint32_t node_id;
+ uint32_t offset;
+ uint32_t max_nodes;
+ if(!PAMIX_Extensions.is_local_task.node_info)
+ MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**notimpl");
+
+ pami_result_t rc = PAMIX_Extensions.is_local_task.node_info(comm->vcr[rank]->taskid,
+ &node_id,&offset,&max_nodes);
+ if(rc != PAMI_SUCCESS) MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**notimpl");
+ *id_p = node_id;
+
+ fn_fail:
+ return mpi_errno;
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPID_Get_max_node_id
+#undef FCNAME
+#define FCNAME MPIDI_QUOTE(FUNCNAME)
+int MPID_Get_max_node_id(MPID_Comm *comm, MPID_Node_id_t *max_id_p)
+{
+ int mpi_errno = MPI_SUCCESS;
+ uint32_t node_id;
+ uint32_t offset;
+ uint32_t max_nodes;
+ if(_g_max_node_id == -1)
+ {
+ if(!PAMIX_Extensions.is_local_task.node_info)
+ MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**notimpl");
+
+ pami_result_t rc = PAMIX_Extensions.is_local_task.node_info(comm->vcr[0]->taskid,
+ &node_id,&offset,&max_nodes);
+ if(rc != PAMI_SUCCESS) MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**notimpl");
+ *max_id_p = max_nodes;
+ _g_max_node_id = max_nodes;
+ }
+ else
+ *max_id_p = _g_max_node_id;
+
+ fn_fail:
+ return mpi_errno;
+}
+#undef FUNCNAME
--
1.8.1-rc1
Originally by balaji on 2013-05-20 21:59:54 -0500
Pushed to mpich/master
in [c3a8a817].
Originally by blocksom on 2013-05-21 09:04:35 -0500
I pushed a couple bgq compile fixes to mpich-ibm/ticket-1842-alt2
. Without these commits the bgq compile of mpich/master
fails.
Originally by blocksom on 2013-05-21 09:28:36 -0500
I deleted the branches mpich-ibm/ticket-1842
and mpich-ibm/ticket-1842-alt
.
Originally by balaji on 2013-05-21 09:48:38 -0500
Pushed to mpich/master
.
Relevant commits: [438a49fe], [f44e3d4a], [038aa7cd].
Originally by blocksom on 2013-05-21 09:58:20 -0500
Deleted branch mpich-ibm/ticket-1842-alt2
.
Originally by blocksom on 2013-05-08 09:26:06 -0500
This ticket is to discuss a commit that touches files in the
src/mpi/coll
directory - in addition to severalpamid
changes.