shankar1729 / jdftx

JDFTx: software for joint density functional theory
http://jdftx.org
79 stars 49 forks source link

Fixed band occupation #298

Closed wamuriel closed 6 months ago

wamuriel commented 9 months ago

Dear JDFTX developers,

I would like to know if it is possible to set the band occupancy in the software, that is to say, to perform DeltaSCF type calculations.

shankar1729 commented 9 months ago

Yes, with a couple of restrictions:

wamuriel commented 8 months ago

Dear Shankar,

Thanks for your answer. I would like to know if setting the occupation in the way you indicate allows you to optimize geometries of excited states of defects, that is, the occupation remains fixed throughout the geometry optimization process?

Kind regards

Wilver

shankar1729 commented 8 months ago

Yes, the occupations should remain the same through ionic steps.

In addition to the previous constraints, note also that you must not have elec-smearing set, otherwise the fillings would be updated every electronic step.

Best, Shankar

wamuriel commented 8 months ago

Thanks for your answer.

I have another question, is it possible to link jdftx to another threaded library other than MKL?. I would like to use the AMD-optimized multithreading blis library, but in CMakeList.txt it says that the -D ThreadedBlas=yes option only affects MKL.

Best, Wilver

shankar1729 commented 8 months ago

Hi Wilver,

This is definitely possible, but requires modification to the code to deal with the specific library. The issue is that BLAS does not have a standard way of specifying threading, so we need to call a library specific function eg. mkl_set_num_threads to control how many threads are being used at a given time. Note that this needs to be dynamic, and not something set by an environment variable, because there are sections of the code where BLAS is called from multiple threads, and there BLAS should only use one thread. (This is much more efficient during per-band wavefunction processing.)

Consequently, support has been implemented for MKL to do this, and it would require a few lines of code for each library knowing the specific thread count control. The last time I checked, OpenBLAS / BLIS did not support this neatly, and hence I did not add this support. (If not handled correctly with a threaded BLAS, this would lead to too many threads overall during the wavefunction processing.) If this has changed, and there is a well-defined thread-setting function, we'd appreciate a pull request.

Finally, note that with a single-threaded BLAS, JDFTx will still do quite a good job at parallelizing over threads. When ThreadedBLAS=no, it signals to JDFTx that it is free to split the BLAS calls over as many threads as allocated cores; it does not mean that the BLAS parts are not threaded. The performance difference for MKL is typically ~ 10-20% for large problem sizes between ThreadedBLAs=yes and no, when running with up to ~ 16 cores / process.

Best, Shankar

wamuriel commented 8 months ago

Dear Shankar,

Thanks for your reply.

I am performing some test calculations on a system that contains 128 atoms, and I have found that step ---------- Allocating electronic variables ---------- is very slow, it takes more than 30 minutes. Additionally, in my calculations I only use the gamma point, but in the output file I find that the variable nstates = 2, according to manual it should be equal to 1, since nstates is the number of irreducible K points.

This is the content of my input file ########################################## include coord.in

ion-species GBRV/$ID_pbe.uspp elec-cutoff 20 100

coulomb-interaction Slab 001

coulomb-truncation-embed 0 0 0.0

kpoint 0.000000 0.00000000 0.00000000 1

spintype z-spin elec-initial-magnetization 1.00000 yes elec-smearing Fermi 0.001 electronic-SCF

lattice-minimize nIterations 10

dump End ElecDensity BandProjections dump-name graphite.$VAR #############################

I am attaching the relevant extract from the output file

Initialized 3 species with 128 total atoms.

Folded 1 k-points by 1x1x1 to 1 k-points.

---------- Setting up k-points, bands, fillings ---------- No reducable k-points. Computing the number of bands and number of electrons Calculating initial fillings. nElectrons: 511.000000 nBands: 512 nStates: 2

----- Setting up reduced wavefunction bases (one per k-point) ----- average nbasis = 320431.000 , ideal nbasis = 320600.942

---------- Setting up ewald sum ---------- Optimum gaussian width for ewald sums = 7.495783 bohr. Real space sum over 567 unit cells with max indices [ 4 4 3 ] Reciprocal space sum over 10469 terms with max indices [ 9 9 14 ]

---------- Allocating electronic variables ---------- this step es very slow

Best,

Wilver

shankar1729 commented 8 months ago

Hi Wilver,

nStates is the number of irreducible k-points * spin, so you get 2 here because of the z-spin mode.

Are you using 1 or 2 MPI processes, with threads to cover the node(s) you are using? Most likely your slowdown is from an incorrect parallelization; post your job file or log file if needed.

Another possibility is that you are running out of memory: your parameters above will overall need ~ 2.4 GB * nStates ~ 4.8 GB per copy of wavefunctions total. You typically need to hold 5-6 copies of the wavefunctions for SCF/minimize. Consequently make sure your job has access to ~ 30 GB memory overall.

Best, Shankar