sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

fix -route-messages #99

Closed sebhtml closed 11 years ago

sebhtml commented 11 years ago

It's broken since mini-ranks were merged I think

sebhtml commented 11 years ago

This fails:

rm -rf 89 ; mpiexec -n 16 ./Ray -route-messages -connection-type debruijn -test-network-only -o 89

sebhtml commented 11 years ago

0 -> 15 0 15 4 0 8 12 14 15

//// On Rank # 0

[routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=10 [setSourceInBuffer] buffer=0x2140060 source=0 [setDestinationInBuffer] buffer=0x2140060 destination=15 [routeOutcomingMessages] relayed message, trueSource=0 trueDestination=15 to intermediateSource 8 [Communication] 1218 microseconds, SEND Source: 0 Destination: 8 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15

//// On Rank # 8

[Communication] 2348 microseconds, RECEIVE Source: 0 Destination: 8 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 0 Current= 8 Next= 12 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=8 RelayDestination=12 [routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeOutcomingMessages] Message has already a routing tag. [Communication] 2464 microseconds, SEND Source: 8 Destination: 12 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15

//// On Rank # 12

[Communication] 120002 microseconds, RECEIVE Source: 8 Destination: 12 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 8 Current= 12 Next= 14 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=12 RelayDestination=14 [routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeOutcomingMessages] Message has already a routing tag. [Communication] 120143 microseconds, SEND Source: 12 Destination: 14 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15

//// On Rank # 14

[Communication] 961 microseconds, RECEIVE Source: 12 Destination: 14 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 12 Current= 14 Next= 15 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=14 RelayDestination=15 [routeOutcomingMessages] tag= RAY_MPI_TAG_SET_WORD_SIZE value=16394 [routeOutcomingMessages] Message has already a routing tag. [Communication] 1075 microseconds, SEND Source: 14 Destination: 15 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15

//// On Rank # 15

[Communication] 89752 microseconds, RECEIVE Source: 14 Destination: 15 Tag: RAY_MPI_TAG_SET_WORD_SIZE Count: 3 Overlay: 15

sebhtml commented 11 years ago

0 5 4 0 8 4 10 5

[NetworkTest.plugin] Rank 0 sends RAY_MPI_TAG_TEST_NETWORK_MESSAGE to 5

[routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=109 [setSourceInBuffer] buffer=0x2170ac0 source=0 [setDestinationInBuffer] buffer=0x2170ac0 destination=5 [routeOutcomingMessages] relayed message, trueSource=0 trueDestination=5 to intermediateSource 8 [Communication] 20254 microseconds, SEND Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0

[Communication] 26638 microseconds, RECEIVE Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 5 Previous= 0 Current= 8 Next= 4 [relayMessage] TrueSource=0 TrueDestination=5 RelaySource=8 RelayDestination=4 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 26774 microseconds, SEND Source: 8 Destination: 4 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0

[Communication] 26873 microseconds, RECEIVE Source: 8 Destination: 4 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 5 Previous= 8 Current= 4 Next= 10 [relayMessage] TrueSource=0 TrueDestination=5 RelaySource=4 RelayDestination=10 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 26989 microseconds, SEND Source: 4 Destination: 10 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE Count: 501 Overlay: 0

10 seems to receive something, but does not relay the object

sebhtml commented 11 years ago

0 15 4 0 8 12 14 15

[NetworkTest.plugin] Rank 0 sends RAY_MPI_TAG_TEST_NETWORK_MESSAGE to 15 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=109 [setSourceInBuffer] buffer=0x27ccbb0 source=0 [setDestinationInBuffer] buffer=0x27ccbb0 destination=15 [routeOutcomingMessages] relayed message, trueSource=0 trueDestination=15 to intermediateSource 8 [Communication] 10521 microseconds, SEND Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15

[Communication] 41196 microseconds, RECEIVE Source: 0 Destination: 8 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 0 Current= 8 Next= 12 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=8 RelayDestination=12 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 41314 microseconds, SEND Source: 8 Destination: 12 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15

[Communication] 23775 microseconds, RECEIVE Source: 8 Destination: 12 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 8 Current= 12 Next= 14 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=12 RelayDestination=14 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 23903 microseconds, SEND Source: 12 Destination: 14 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15

[Communication] 25313 microseconds, RECEIVE Source: 12 Destination: 14 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has been sent to the next one, trueSource=0 trueDestination= 15 Previous= 12 Current= 14 Next= 15 [relayMessage] TrueSource=0 TrueDestination=15 RelaySource=14 RelayDestination=15 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeOutcomingMessages] Message has already a routing tag. [Communication] 25438 microseconds, SEND Source: 14 Destination: 15 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15

[Communication] 24726 microseconds, RECEIVE Source: 14 Destination: 15 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE RealTag: 16493 Count: 501 Overlay: 15 [routeIncomingMessages] inbox.size= 1 [routeIncomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE value=16493 [routeIncomingMessages] message has reached destination, must strip routing information [routeIncomingMessages] real tag= 109 [NetworkTest.plugin] Rank 15 receives RAY_MPI_TAG_TEST_NETWORK_MESSAGE from 0

15 0 4 15 7 3 1 0

[NetworkTest.plugin] Rank 15 sends RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY to 0 [routeOutcomingMessages] tag= RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY value=111 [setSourceInBuffer] buffer=0x1e91470 source=15 [setDestinationInBuffer] buffer=0x1e91470 destination=0 [routeOutcomingMessages] relayed message, trueSource=15 trueDestination=0 to intermediateSource 7 [Communication] 24942 microseconds, SEND Source: 15 Destination: 7 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY RealTag: 16495 Count: 1 Overlay: 15

[Communication] 26443 microseconds, RECEIVE Source: 15 Destination: 7 Tag: RAY_MPI_TAG_TEST_NETWORK_MESSAGE_REPLY RealTag: 16495 Count: 1 Overlay: 15

(no relay events after that)

15 7 1 15 7

Even if there is a routing tag, it seems that if direct communication is allowed, then routing is ruled out.

sebhtml commented 11 years ago

The problem is that not all processes are enabling their routing subsystem:

[boiseb01@ls30 RayKmerSearchDevel]$ ls 89.1.|wc -l 16 [boiseb01@ls30 RayKmerSearchDevel]$ grep "Enabled message routing" 89.1.|wc -l 9

sebhtml commented 11 years ago

8a96f13bb1e7956e342df7c4d7de8b7f0373d5b0