Closed robfitzgerald closed 7 years ago
@mkalan
fixing logging is hard. what i've already done is insane. instead of defining new types, i am sending single values or arrays of built-in types one-at-a-time from a function handleLogs()
. all processes are keeping a std::list
i dealt with all of the compile errors and ran it (template.cpp) and got a seg fault. tried stubbing it out to find the problem. master seems to receive a fully-fledged log from process 1 and finish and move on to receive from another. i guess something that i'm using to receive values isn't copying by value and the reference goes out of scope.
work in progress here.
robert.fitzgerald@heracles src]$ srun --mpi=pmi2 -n6 -w "node[3,4]" /home/robert.fitzgerald/csc5593/project/csci5593-project/src/a.out myTest 100 500
runTest called with TestConfig:
testName: myTest, iterations: 100, messages: 500.
for process 3 of 6 on node node4
runTest called with TestConfig:
testName: myTest, iterations: 100, messages: 500.
for process 0 of 6 on node node3
runTest called with TestConfig:
testName: myTest, iterations: 100, messages: 500.
for process 4 of 6 on node node4
runTest called with TestConfig:
testName: myTest, iterations: 100, messages: 500.
for process 5 of 6 on node node4
runTest called with TestConfig:
testName: myTest, iterations: 100, messages: 500.
for process 1 of 6 on node node3
runTest called with TestConfig:
testName: myTest, iterations: 100, messages: 500.
for process 2 of 6 on node node3
srun: error: node3: task 0: Segmentation fault (core dumped)
@mkalan
i'm back to this. i am not sending or receiving values correctly.
master log 0: myTest node3 0 98765 0.049000 demonstrationprocess 5 send name
process 5 send node
master log 1: myTest node3 1 98765 2181870753490199573974407650595485384721160836668215214891039728797850014325211914223045085066158316062607765912352152648743422216901014694519754890573534962191294039052030139321391794119467394653459835293570249601325430567428593023945157902336.000000 demonstrsrun: error: node3: task 0: Segmentation fault (core dumped)
it looks like the timeDelta value is coming through as a gremlin. you can see the message comes next and it is correctly formed - the "demonstr" in "demonstrsrun:".
done!
batch send logs instead of what we are currently doing