Closed sebhtml closed 11 years ago
This fails:
rm -rf 89 ; mpiexec -n 16 ./Ray -route-messages -connection-type debruijn -test-network-only -o 89
0 -> 15 0 15 4 0 8 12 14 15
//// On Rank # 0
[routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=10 [setSourceInBuffer] buffer=0x2140060 source=0 [setDestinationInBuffer] buffer=0x2140060 destination=15 [routeOutcomingMessages] relayed message, trueSource=0 trueDestination=15 to intermediateSource 8 [Communication] 1218 microseconds, SEND Source: 0 Destination: 8 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15
//// On Rank # 8
[Communication] 2348 microseconds, RECEIVE Source: 0 Destination: 8 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 0 Current= 8 Next= 12 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=8 RelayDestination=12 [routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeOutcomingMessages] Message has already a routing tag. [Communication] 2464 microseconds, SEND Source: 8 Destination: 12 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15
//// On Rank # 12
[Communication] 120002 microseconds, RECEIVE Source: 8 Destination: 12 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 8 Current= 12 Next= 14 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=12 RelayDestination=14 [routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeOutcomingMessages] Message has already a routing tag. [Communication] 120143 microseconds, SEND Source: 12 Destination: 14 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15
//// On Rank # 14
[Communication] 961 microseconds, RECEIVE Source: 12 Destination: 14 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 12 Current= 14 Next= 15 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=14 RelayDestination=15 [routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeOutcomingMessages] Message has already a routing tag. [Communication] 1075 microseconds, SEND Source: 14 Destination: 15 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15
//// On Rank # 15
[Communication] 89752 microseconds, RECEIVE Source: 14 Destination: 15 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15
0 5 4 0 8 4 10 5
[NetworkTest.plugin] Rank 0 sends RAY_MPI_TAG_TEST_NETWORK_MESSAGE to 5
[routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=109 [setSourceInBuffer] buffer=0x2170ac0 source=0 [setDestinationInBuffer] buffer=0x2170ac0 destination=5 [routeOutcomingMessages] relayed message, trueSource=0 trueDestination=5 to intermediateSource 8 [Communication] 20254 microseconds, SEND Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0
[Communication] 26638 microseconds, RECEIVE Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 5 Previous= 0 Current= 8 Next= 4 [relayMessage] TrueSource=0 TrueDestination=5 RelaySource=8 RelayDestination=4 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 26774 microseconds, SEND Source: 8 Destination: 4 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0
[Communication] 26873 microseconds, RECEIVE Source: 8 Destination: 4 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 5 Previous= 8 Current= 4 Next= 10 [relayMessage] TrueSource=0 TrueDestination=5 RelaySource=4 RelayDestination=10 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 26989 microseconds, SEND Source: 4 Destination: 10 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0
10 seems to receive something, but does not relay the object
0 15 4 0 8 12 14 15
[NetworkTest.plugin] Rank 0 sends RAY_MPI_TAG_TEST_NETWORK_MESSAGE to 15 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=109 [setSourceInBuffer] buffer=0x27ccbb0 source=0 [setDestinationInBuffer] buffer=0x27ccbb0 destination=15 [routeOutcomingMessages] relayed message, trueSource=0 trueDestination=15 to intermediateSource 8 [Communication] 10521 microseconds, SEND Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15
[Communication] 41196 microseconds, RECEIVE Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 0 Current= 8 Next= 12 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=8 RelayDestination=12 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 41314 microseconds, SEND Source: 8 Destination: 12 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15
[Communication] 23775 microseconds, RECEIVE Source: 8 Destination: 12 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 8 Current= 12 Next= 14 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=12 RelayDestination=14 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 23903 microseconds, SEND Source: 12 Destination: 14 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15
[Communication] 25313 microseconds, RECEIVE Source: 12 Destination: 14 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 12 Current= 14 Next= 15 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=14 RelayDestination=15 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 25438 microseconds, SEND Source: 14 Destination: 15 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15
[Communication] 24726 microseconds, RECEIVE Source: 14 Destination: 15 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has reached destination, must strip routing information [routeIncomingMessages] real tag= 109 [NetworkTest.plugin] Rank 15 receives RAY_MPI_TAG_TEST_NETWORK_MESSAGE from 0
15 0 4 15 7 3 1 0
[NetworkTest.plugin] Rank 15 sends RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY to 0 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY value=111 [setSourceInBuffer] buffer=0x1e91470 source=15 [setDestinationInBuffer] buffer=0x1e91470 destination=0 [routeOutcomingMessages] relayed message, trueSource=15 trueDestination=0 to intermediateSource 7 [Communication] 24942 microseconds, SEND Source: 15 Destination: 7 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY RealTag: 16495 Count: 1 Overlay: 15
[Communication] 26443 microseconds, RECEIVE Source: 15 Destination: 7 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY RealTag: 16495 Count: 1 Overlay: 15
(no relay events after that)
15 7 1 15 7
Even if there is a routing tag, it seems that if direct communication is allowed, then routing is ruled out.
The problem is that not all processes are enabling their routing subsystem:
[boiseb01@ls30 RayKmerSearchDevel]$ ls 89.1.|wc -l 16 [boiseb01@ls30 RayKmerSearchDevel]$ grep "Enabled message routing" 89.1.|wc -l 9
8a96f13bb1e7956e342df7c4d7de8b7f0373d5b0
It's broken since mini-ranks were merged I think