resulting correlation function seems to be empty

kostrzewa commented 7 years ago

@solomonik Thanks a lot for getting me started with CTF at the time. Only now have I had the time to take another look at the problem that I want to solve using your framework.

I have now implemented a sequence of contractions, getting rid of all uses of std::vector, thus using CTF tensors for all indices. This can now be seen as exemplary for the kind of quantity that we would like to compute in lattice QCD or other distretized Euclidean (or otherwise) field theories.

https://github.com/kostrzewa/nyom/blob/master/benchmarks/bench_ctf/bench_ctf.cpp

Unfortunately, I've hit a bit of a problem again in that I get what appears to be an empty result vector after all the dust has settled. If you happen to have some time, I would very much appreciate if you could take a look to see if you spot anything obviously wrong.

Thanks a lot in advance!

kostrzewa commented 7 years ago

The other question that I wanted to ask concerns low rank matrices with small ranges when the code is highly parallel (hundreds or thousands of MPI tasks, potentially). As far as I can tell, my 2x2 "tau" matrix, is split over four tasks (no matter how many tasks there are in total, as long as its more than 4). When contracting this with another tensor of much higher rank (of which two indices have range 2, thus being contractable with the tau matrix), is the small tensor duplicated over all tasks to perform the contraction?

solomonik commented 7 years ago

Hi Bartosz,

It looks like the resulting vector is zero because S2_m2 was not computed (see diff below, not sure if its what was actually intended). Also, in some places it seems like you should be using += rather than = (the latter will annihilate the previous elements of the tensor rather than accumulate).

Regarding performance, I think the running time of this code will be dominated by S_tx*Phi_tx for sufficiently large problem sizes. Indeed even small matrices are distributed by default over all processors, but the library should be smart enough to replicate them as necessary and not communicate the bigger tensors during a contraction. Although it might transpose them locally in some cases, so the performance of a contraction with a small tensor will not necessarily be near peak (but scalability should be good). Regardless though, if you have a larger contraction like S_tx*Phi_tx, it should dominate the execution time anyway and be executed more efficiently within CTF.

diff --git a/benchmarks/bench_ctf/bench_ctf.cpp b/benchmarks/bench_ctf/bench_ctf.cpp index 371d9f6..a1f4a39 100644 --- a/benchmarks/bench_ctf/bench_ctf.cpp +++ b/benchmarks/bench_ctf/bench_ctf.cpp @@ -159,8 +159,8 @@ int main(int argc, char * argv) { S1_m1["txyzfgjkab"] = Tshift_m1["tT"]S1["Txyzfgjkab"]; S1_m1["txyzfgjkab"] = Tshift_m1["tT"]*S2["Txyzfgjkab"];

- S1_m2["txyzfgjkab"] = Tshift_m2["tT"]S1["Txyzfgjkab"]; - S1_m2["txyzfgjkab"] = Tshift_m2["tT"]S2["Txyzfgjkab"]; + S2_m2["txyzfgjkab"] = Tshift_m2["tT"]S1["Txyzfgjkab"]; + S2_m2["txyzfgjkab"] = Tshift_m2["tT"]S2["Txyzfgjkab"];

// now construct gamma^0 g0.read_local(&npair, &indices, &pairs);

kostrzewa commented 7 years ago

@solomonik Thanks a lot and sorry for having disturbed you with yet another silly set of typos! This works great, I can't wait to present it to my working group on Monday. I will proceed with an implementation for a real LQCD computation in the coming two weeks!

kostrzewa commented 7 years ago

Regarding the accumulation, the intermediate tensors S_[t1,t2] are temporary objects with the final pair of tensors S_f[1,2] serving as final accumulators for the two objects that are eventually traced over on most indices.

solomonik commented 7 years ago

Great to hear this addressed the problem. Regarding +=, it still looks like S1_m2 and S2_m2 get set then reset, so I am guessing something is not quite correct there.

kostrzewa commented 7 years ago

Yes, this was another typo, the following is correct:

  S1_m1["txyzfgjkab"] = Tshift_m1["tT"]*S1["Txyzfgjkab"];
  S2_m1["txyzfgjkab"] = Tshift_m1["tT"]*S2["Txyzfgjkab"];

  S1_m2["txyzfgjkab"] = Tshift_m2["tT"]*S1["Txyzfgjkab"];
  S2_m2["txyzfgjkab"] = Tshift_m2["tT"]*S2["Txyzfgjkab"];

kostrzewa / nyom

resulting correlation function seems to be empty #2