Closed tianchaijz closed 5 years ago
I'm not sure how to interpret the output above. Are the 9 lines from 9 different contexts? Are you giving each one a different graph to work on (by varying the arguments to run_solver()) ? It's also puzzling why the trim took 0 ms in each case, and why there are always 46 edges left. Where is your code calling run_solver() ?
Hi John,
Thanks for your reply,
I just create a simple function named find_solution
:
CALL_CONVENTION int find_solution(SolverCtx* ctx,
char* header,
int header_length,
SolverSolutions *solutions)
{
if (ctx == NULL || !ctx->trimmer.initsuccess){
print_log("Error initialising trimmer. Aborting.\n");
print_log("Reason: %s\n", LAST_ERROR_REASON);
return -__LINE__;
}
ctx->setheadernonce(header, header_length, 0);
u32 nsols = ctx->solve();
for (unsigned s = 0; s < nsols; s++) {
print_log("Solution");
u32* prf = &ctx->sols[s * PROOFSIZE];
if (solutions != NULL){
solutions->edge_bits = EDGEBITS;
solutions->num_sols++;
for (u32 i = 0; i < PROOFSIZE; i++)
solutions->sols[s].proof[i] = (u64) prf[i];
}
int pow_rc = verify(prf, &ctx->trimmer.sipkeys);
if (pow_rc != POW_OK) {
return -__LINE__;
}
}
return nsols;
}
I give each context a different header, and called the solver this way:
var solverSols SolverSolutions
n := C.find_solution(ctx, header, headerLen, (*C.SolverSolutions)(unsafe.Pointer(&solverSols)))
if n > 0 {
sols := make([]uint32, 42)
for i := 0; i < int(n); i++ {
for j := 0; j < 42; j++ {
sols[j] = uint32(solverSols.sols[i].proof[j])
}
}
} else if n < 0 {
return fmt.Errorf("cgo return code: %d", int(n))
}
The full log context:
GPU1 finding solution
GPU4 finding solution
GPU2 finding solution
GPU0 finding solution
GPU1 finding solution
GPU4 finding solution
GPU2 finding solution
GPU0 finding solution
eted in 63 + 46 ms
Seeding completed in 63 + 46 ms
Seeding completed in 63 + 46 ms
Seeding completed in 63 + 46 ms
Seeding completed in 63 + 46 ms
2-cycle found
8-cycle found
54-cycle found
findcycles edges 63077 time 56 ms total 390 ms
12-cycle found
4-cycle found
20-cycle found
4-cycle found
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RgAAACQPgXA=
6-cycle found
100-cycle found
406-cycle found
594-cycle found
340-cycle found
findcycles edges 71378 time 49 ms total 386 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RwAAACQPgXA=
38-cycle found
10-cycle found
66-cycle found
12-cycle found
2466-cycle found
70-cycle found
findcycles edges 74681 time 58 ms total 383 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=QwAAACQPgXA=
232-cycle found
306-cycle found
findcycles edges 61895 time 53 ms total 378 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RAAAACQPgXA=
20-cycle found
74-cycle found
28-cycle found
332-cycle found
findcycles edges 63438 time 51 ms total 379 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SAAAACQPgXA=
38-cycle found
findcycles edges 64257 time 74 ms total 400 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=RQAAACQPgXA=
findcycles edges 77 time 2 ms total 106 ms
findcycles edges 77 time 2 ms total 137 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=TAAAACQPgXA=
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=TgAAACQPgXA=
findcycles edges 77 time 1 ms total 189 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SwAAACQPgXA=
findcycles edges 77 time 1 ms total 280 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SQAAACQPgXA=
findcycles edges 77 time 0 ms total 365 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=TQAAACQPgXA=
findcycles edges 77 time 3 ms total 432 ms
findcycles edges 77 time 4 ms total 319 ms
findcycles edges 46 time 4 ms total 234 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=SgAAACQPgXA=
findcycles edges 77 time 4 ms total 319 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=UQAAACQPgXA=
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=VAAAACQPgXA=
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=UgAAACQPgXA=
findcycles edges 46 time 2 ms total 2 ms
findcycles edges 46 time 2 ms total 2 ms
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=VwAAACQPgXA=
setheader: sKrQQF/FfgmOD8PO0OeVJSNq0ssZM7xJ/ANzAPWt9sQ=UAAAACQPgXA=
The earlier runs with edges in the 10s of thousands look fine, but the ones with less than 100 edges are wrong and spent 0ms on trimming. Perhaps you can more diagnostic output from the trimming routine to figure out why they skipped nearly all the work. For instance, SeedA should compute siphashes for 2^29 nodes which necessarily takes a lot of time. I think with all data encapsulated in the SolverCtx, the solver should be thread safe.
Thanks John!
I'm continue debugging, will let you know if this problem solved!
It seems that the edgetrimmer *dt
point to invalid memory area.
651 int solve() {
(gdb) n
653 auto time0 = std::chrono::high_resolution_clock::now();
(gdb) n
655 trimmer.abort = false;
(gdb) n
656 u32 nedges = trimmer.trim();
(gdb) s
edgetrimmer::trim (this=0x7f3510000c00) at mean.cu:416
416 u32 trim() {
(gdb) n
417 cudaMemcpy(dt, this, sizeof(edgetrimmer), cudaMemcpyHostToDevice);
(gdb) p *dt
Cannot access memory at address 0x7f34cf400000
(gdb) p *this
$1 = {tp = {expand = 0, ntrims = 176, genA = {blocks = 4096, tpb = 256}, genB = {blocks = 4096,
tpb = 128}, trim = {blocks = 4096, tpb = 512}, tail = {blocks = 4096, tpb = 1024}, recover = {
blocks = 1024, tpb = 1024}}, dt = 0x7f34cf400000, sizeA = 4462739456, sizeB = 2852126720,
indexesSize = 16384, bufferA = 0x7f331a600000, bufferB = 0x7f3424600000,
bufferAB = 0x7f33c4600000, indexesE = 0x7f34cec00000, indexesE2 = 0x7f34cec04000, hostA = {72644,
0 <repeats 4095 times>}, uvnodes = 0x7f34cf000000, sipkeys = {k0 = 3288269178733044074,
k1 = 2872931202106657597, k2 = 11375450941595734051, k3 = 5158531013351406122},
dipkeys = 0x7f34cf000200, abort = false, initsuccess = true}
(gdb) n
420 checkCudaErrors(cudaEventCreate(&startall)); checkCudaErrors(cudaEventCreate(&stopall));
(gdb) p *dt
Cannot access memory at address 0x7f34cf400000
I'm continue debugging ...
dear tianchaijz,
It seems that the edgetrimmer *dt point to invalid memory area.
651 int solve() { (gdb) n 653 auto time0 = std::chrono::high_resolution_clock::now(); (gdb) 655 trimmer.abort = false; (gdb) 656 u32 nedges = trimmer.trim(); (gdb) s edgetrimmer::trim (this=0x7f94c4000c00) at mean.cu:416 416 u32 trim() { (gdb) n 417 cudaMemcpy(dt, this, sizeof(edgetrimmer), cudaMemcpyHostToDevice); (gdb) p sizeof(edgetrimmer) $4 = 16536 (gdb) p *dt Cannot access memory at address 0x7f9479800000
I'm continue debugging ...
I noticed my cuckoo/mean.cu had a redundant cudaMemCpy. I removed that one in my latest commit. Don't see how that could be responsible for behaviour you saw though...
regards, -John
Hi John,
Still not solved, really weird. I'll try newer nvidia driver ...
Hi John,
This problem is solved.
Since goroutine may be scheduled running on different OS thread, which lead this problem, I add runtime.LockOSThread()
, my program has been running hours, all look fine now.
I still don't understand how running on different OS threads explains the faulty behaviour. But I'm happy to hear you found a solution!
Thank you again, John!
Hi John:
I build
mean.cu
as a shared library, and use cgo to call it. Create one goroutine for each graphic card, and binding one solver context for each goroutine.One solver context works as expected, but multiple contexts not work well, after running a while, those solver contexts stopping working, and the log shows:
Seems that trimming phase is not work. Any ideas? Thanks!