Closed krl52 closed 2 years ago
Your job is writing wavefunctions at every electronic step; this can be a huge overhead. I would recommend removing "dump Electronic State". Everything else seems fine to me. Let me know if running with 10 processes works faster without that.
I wouldn't read too much into the time for the tests; that seems fine. There could be quite significant per-core differences between machines: each laptop core could be quite a bit faster than a single core of an older many-core chip.
Best, Shankar
I am running a geometry optimization for a 2x2 Cu(100) supercell. Initially I ran the calculation on 28 processes with one thread each, and the total time is about 2 hours. There are 10 reduced k-points, so I re-ran the calculation on 10 processes with 2 threads each with mpirun -n 10 --bind-to none jdftx -c 2 ... The total time is now about 1 hour. I checked that the build tests pass in parallel, which they do. The total time is about 1200s on an instance with 4 cores which is a bit higher than what is recommended on the JDFTx website. Does this sound reasonable or is there any further suggestion for reducing computation time?
edit: I attached the input and output files below. Cu100-2.out.txt Cu100.in.txt Cu100.out.txt