MPI parallelization over two-particle Green's function components?

pomerol-ed / pomerol2triqs

Quick and dirty TRIQS wrapper around the Pomerol exact diagonalization library

GNU General Public License v3.0

9 stars 7 forks source link

MPI parallelization over two-particle Green's function components? #5

Closed HugoStrand closed 6 years ago

HugoStrand commented 6 years ago

Dear Igor,

I am not sure whether to post this question at pomerol2triqs or pomerol. I am calculating two particle Green's functions and it is very slow. I think I am using some mpi parallelism since pomerol says it is distributing work over "processors".

My question is whether I somehow can convince pomerol to compute the different two-particle Green's function elements in parallel? (I have a 4x4x4x4 = 256 elements to calculate in total, so that would be a nice speedup if done in parallel)

Is this possible to do this out of the box with pomerol? If not, what would I have to do to accomplish this?

Cheers, Hugo

j-otsuki commented 6 years ago

Dear Hugo,

Let me suggest one thing. In the latest commit of pomerol library, truncation of irrelevant blocks was implemented. It yields considerable speed up of two-particle calculations for multiorbital models.

This function is not yet imported in pomerol2triqs. Anyway, please try it and see if performance is improved.

How to use it: Just call the method

rho.truncateBlocks(1e-15);

after DensityMatrix computation is done. See tutorial/example2site.cpp in pomerol.

Just for your information, explanation of the idea is given in https://github.com/aeantipov/pomerol/pull/16

I hope it helps you.

Best, Junya

krivenko commented 6 years ago

Dear Hugo,

My question is whether I somehow can convince pomerol to compute the different two-particle Green's function elements in parallel? (I have a 4x4x4x4 = 256 elements to calculate in total, so that would be a nice speedup if done in parallel)

pomerol2triqs computes elements of the two-particle Green's function one by one. For each given combination of GF indices pomerol performs internal parallelization over contributions to that particular element.

MPI parallelization over index combinations would be a serious change to... well, 'architecture' of pomerol2triqs. Probably, something involving splitting of MPI communicators. Unfortunately, I do not have resources to introduce a feature of this scale into a code that has never been of production quality.

What I can try to do to improve performance is adding supports for Junya's truncateBlocks(). This looks rather trivial.

HugoStrand commented 6 years ago

Dear Junya and Igor,

Thank you both for the explanations and suggestions. Junya, I have tested adding this cutoff in pomerol2triqs and it works, but I think my system is to small (four spinful fermions) for this to give much speedup.

Igor, I am not asking for this to be implemented please do not think about that. I was just naively asking whether this was supported or not.

Cheers, Hugo

krivenko commented 6 years ago

Let me reopen this issue and close it when support for truncateBlocks() is added.

j-otsuki commented 6 years ago

Dear Hugo and Igor,

Hugo, thank you for trying the cutoff option. I understand that your system is small and the cutoff did not give much speedup.

Igor, thank you for considering to support the cutoff option. It would work in future applications to multi-orbital systems.

Best, Junya