shankar1729 / jdftx

JDFTx: software for joint density functional theory
http://jdftx.org
79 stars 49 forks source link

JDFTx Crashes Ubuntu 22.04 System Unexpectedly #332

Closed AndrewTaehonKim closed 1 month ago

AndrewTaehonKim commented 1 month ago

Dear Shankar1729,

Thank you for creating such an intuitive and powerful program. I have run into an issue that I have been trying to debug for a couple months to no avail. I hope that you may have some insight into how to approach the problem.

Background

I am running JDFTx on 8 different systems, one on a Windows/WSL2 and the rest on Ubuntu 22.04 single boot. When I refer to a "crash," I am specifically talking about an issue that causes the computer to reboot in both the Windows/WSL2 and Ubuntu systems. Different molecular systems have been giving me issues and only on certain computers. Below are the specifications for the two computers and the molecular system that caused the crash (their input, ionpos, and lattice files). At the end, I also list some of the ways I have tried to debug this issue. I run these systems on the CPU/memory because I do not have a suitable GPU. I also used the same script to set up the same JDFTx version (v1.7.0) for each system, in other words, the JDFTx setup should be identical on all systems. The simulations were run using the command: jdftx -i [input] -o [output] as per the documentation. The input files themselves can be found here: https://drive.google.com/drive/folders/1BVHvVKwWKgM1ZLlDgQblFTaGsOtro3uk?usp=sharing

Ubuntu 22.04.4 LTS

OS Type: 64 bit. Memory: 64 GB Processor: AMD Ryzen Threadripper Pro 3955wx16-cores x 32 Graphics: NV164 Disk Capacity: 1 TB

Input file:

include NiS2@Na2S.lattice include NiS2@Na2S.ionpos coords-type cartesian coulomb-interaction Slab 001 coulomb-truncation-embed 0 0 6.363526172499999 ion-species GBRV/$ID_pbe.uspp elec-cutoff 20 100 elec-smearing Fermi 0.01 kpoint-folding 1 1 1 electronic-SCF ionic-minimize \ nIterations 200 \ dirUpdateScheme FletcherReeves \ energyDiffThreshold 1e-6 \ knormThreshold 1e-4 #Threshold on RMS cartesian force dump-name NiS2@Na2S.$VAR dump End State

Ionpos file:

Ionic positions in cartesian coordinates:

ion Ni 5.363526172499999 0.000000000000000 5.363526172499999 0 ion Ni 32.181157034999998 5.363526172499999 5.363526172499999 0 ion Ni 5.363526172499999 5.363526172499999 10.727052344999997 0 ion Ni 32.181157034999998 10.727052344999999 10.727052344999997 0 ion Ni 32.181157034999998 21.454104689999998 10.727052344999997 0 ion Ni 5.363526172499999 21.454104689999998 5.363526172499999 0 ion Ni 5.363526172499999 10.727052344999999 5.363526172499999 0 ion Ni 32.181157034999998 16.090578517499999 5.363526172499999 0 ion Ni 32.181157034999998 26.817630862500000 5.363526172499999 0 ion Ni 5.363526172499999 26.817630862500000 10.727052344999997 0 ion Ni 5.363526172499999 16.090578517499999 10.727052344999997 0 ion Ni 10.727052344999999 0.000000000000000 10.727052344999997 0 ion Ni 21.454104689999998 0.000000000000000 10.727052344999997 0 ion Ni 21.454104689999998 5.363526172499999 5.363526172499999 0 ion Ni 10.727052344999999 5.363526172499999 5.363526172499999 0 ion Ni 16.090578517499999 5.363526172499999 10.727052344999997 0 ion Ni 26.817630862500000 5.363526172499999 10.727052344999997 0 ion Ni 26.817630862500000 0.000000000000000 5.363526172499999 0 ion Ni 16.090578517499999 0.000000000000000 5.363526172499999 0 ion Ni 10.727052344999999 10.727052344999999 10.727052344999997 0 ion Ni 21.454104689999998 21.454104689999998 10.727052344999997 0 ion Ni 26.817630862500000 21.454104689999998 5.363526172499999 0 ion Ni 16.090578517499999 10.727052344999999 5.363526172499999 0 ion Ni 21.454104689999998 16.090578517499999 5.363526172499999 0 ion Ni 10.727052344999999 26.817630862500000 5.363526172499999 0 ion Ni 16.090578517499999 26.817630862500000 10.727052344999997 0 ion Ni 26.817630862500000 16.090578517499999 10.727052344999997 0 ion Ni 10.727052344999999 21.454104689999998 10.727052344999997 0 ion Ni 21.454104689999998 10.727052344999999 10.727052344999997 0 ion Ni 26.817630862500000 10.727052344999999 5.363526172499999 0 ion Ni 16.090578517499999 21.454104689999998 5.363526172499999 0 ion Ni 21.454104689999998 26.817630862500000 5.363526172499999 0 ion Ni 10.727052344999999 16.090578517499999 5.363526172499999 0 ion Ni 16.090578517499999 16.090578517499999 10.727052344999997 0 ion Ni 26.817630862500000 26.817630862500000 10.727052344999997 0 ion S 4.232239308000000 4.232239308000000 4.232258205299998 0 ion S 27.948917726999998 27.948917726999998 6.494794139699998 0 ion S 1.131286864500000 27.948917726999998 9.595784377799996 0 ion S 9.595765480499999 4.232239308000000 1.131267967200000 0 ion S 27.948917726999998 9.595765480499999 1.131267967200000 0 ion S 4.232239308000000 1.131286864500000 9.595784377799996 0 ion S 9.595765480499999 1.131286864500000 6.494794139699998 0 ion S 1.131286864500000 9.595765480499999 4.232258205299998 0 ion S 4.232239308000000 14.959291652999996 4.232258205299998 0 ion S 27.948917726999998 17.221865381999994 6.494794139699998 0 ion S 1.131286864500000 17.221865381999994 9.595784377799996 0 ion S 9.595765480499999 14.959291652999996 1.131267967200000 0 ion S 27.948917726999998 20.322817825500000 1.131267967200000 0 ion S 4.232239308000000 22.585391554499999 9.595784377799996 0 ion S 9.595765480499999 22.585391554499999 6.494794139699998 0 ion S 1.131286864500000 20.322817825500000 4.232258205299998 0 ion S 14.959291652999996 4.232258205300000 4.232239307999999 0 ion S 17.221865381999994 27.948898829699999 6.494813036999998 0 ion S 17.221865381999994 9.595784377799999 1.131286864500000 0 ion S 14.959291652999996 1.131267967200000 9.595765480499995 0 ion S 20.322817825500000 1.131267967200000 6.494813036999998 0 ion S 22.585391554499999 9.595784377799999 4.232239307999999 0 ion S 22.585391554499999 27.948898829699999 9.595765480499995 0 ion S 20.322817825500000 4.232258205300000 1.131286864500000 0 ion S 4.232239308000000 25.686343997999991 4.232258205299998 0 ion S 27.948917726999998 6.494813036999998 6.494794139699998 0 ion S 1.131286864500000 6.494813036999998 9.595784377799996 0 ion S 9.595765480499999 25.686343997999991 1.131267967200000 0 ion S 27.948917726999998 31.049870170499990 1.131267967200000 0 ion S 4.232239308000000 11.858339209499999 9.595784377799996 0 ion S 9.595765480499999 11.858339209499999 6.494794139699998 0 ion S 1.131286864500000 31.049870170499990 4.232258205299998 0 ion S 25.686343997999991 4.232258205300000 4.232239307999999 0 ion S 6.494813036999998 27.948898829699999 6.494813036999998 0 ion S 6.494813036999998 9.595784377799999 1.131286864500000 0 ion S 25.686343997999991 1.131267967200000 9.595765480499995 0 ion S 31.049870170499990 1.131267967200000 6.494813036999998 0 ion S 11.858339209499999 9.595784377799999 4.232239307999999 0 ion S 11.858339209499999 27.948898829699999 9.595765480499995 0 ion S 31.049870170499990 4.232258205300000 1.131286864500000 0 ion S 14.959291652999996 14.959291652999996 4.232258205299998 0 ion S 17.221865381999994 17.221865381999994 6.494794139699998 0 ion S 22.585391554499999 17.221865381999994 9.595784377799996 0 ion S 20.322817825500000 14.959291652999996 1.131267967200000 0 ion S 17.221865381999994 20.322817825500000 1.131267967200000 0 ion S 14.959291652999996 22.585391554499999 9.595784377799996 0 ion S 20.322817825500000 22.585391554499999 6.494794139699998 0 ion S 22.585391554499999 20.322817825500000 4.232258205299998 0 ion S 14.959291652999996 25.686343997999991 4.232258205299998 0 ion S 17.221865381999994 6.494813036999998 6.494794139699998 0 ion S 22.585391554499999 6.494813036999998 9.595784377799996 0 ion S 20.322817825500000 25.686343997999991 1.131267967200000 0 ion S 17.221865381999994 31.049870170499990 1.131267967200000 0 ion S 14.959291652999996 11.858339209499999 9.595784377799996 0 ion S 20.322817825500000 11.858339209499999 6.494794139699998 0 ion S 22.585391554499999 31.049870170499990 4.232258205299998 0 ion S 25.686343997999991 14.959291652999996 4.232258205299998 0 ion S 6.494813036999998 17.221865381999994 6.494794139699998 0 ion S 11.858339209499999 17.221865381999994 9.595784377799996 0 ion S 31.049870170499990 14.959291652999996 1.131267967200000 0 ion S 6.494813036999998 20.322817825500000 1.131267967200000 0 ion S 25.686343997999991 22.585391554499999 9.595784377799996 0 ion S 31.049870170499990 22.585391554499999 6.494794139699998 0 ion S 11.858339209499999 20.322817825500000 4.232258205299998 0 ion S 25.686343997999991 25.686343997999991 4.232258205299998 0 ion S 6.494813036999998 6.494813036999998 6.494794139699998 0 ion S 11.858339209499999 6.494813036999998 9.595784377799996 0 ion S 31.049870170499990 25.686343997999991 1.131267967200000 0 ion S 6.494813036999998 31.049870170499990 1.131267967200000 0 ion S 25.686343997999991 11.858339209499999 9.595784377799996 0 ion S 31.049870170499990 11.858339209499999 6.494794139699998 0 ion S 11.858339209499999 31.049870170499990 4.232258205299998 0 ion S 16.915387542003984 10.330294086208403 17.142885211696779 1 ion Na 12.933390730321465 12.892368092916593 17.103770631399833 1 ion Na 20.862082197358593 12.900506466136132 17.119481088656119 1

Lattice file:

lattice Triclinic 32.181088723 32.181088723 37.794599999999996 90 90 90

Windows/WSL2

OS Type: 64 bit. Memory: 16 GB Processor: 12th Gen Intel(R) Core i3-12100F 3.3 GHz Graphics: NVDIA GT1030 Disk Capacity: 500 GB

Input file

include Na2S2@TiO2.lattice include Na2S2@TiO2.ionpos coords-type cartesian

coulomb-interaction Slab 001 coulomb-truncation-embed 0 0 8.0

ion-species GBRV/$ID_pbe.uspp elec-cutoff 20 100

elec-smearing Fermi 0.01 kpoint-folding 1 1 1 electronic-SCF

ionic-minimize \ nIterations 200 \ dirUpdateScheme FletcherReeves \ energyDiffThreshold 1e-6 \ knormThreshold 1e-4 #Threshold on RMS cartesian force

dump-name Na2S2@TiO2.$VAR dump End State

Ionpos file

Ionic positions in cartesian coordinates:

ion Ti 0.0 0.0 2.0 0 ion O 2.6502896331 2.6502896331 2.0 0 ion Ti 4.3404452478 4.3404452478 4.795024053800001 0 ion O 6.0306197598 6.0306197598 2.0 0 ion O 1.6901745119999998 6.990734880899999 4.795024053800001 0 ion O 6.990734880899999 1.6901745119999998 4.795024053800001 0 ion Ti 0.0 8.680909392899999 2.0 0 ion O 2.6502896331 11.331199026 2.0 0 ion Ti 4.3404452478 13.0213546407 4.795024053800001 0 ion O 6.0306197598 14.7115291527 2.0 0 ion O 1.6901745119999998 15.6716442738 4.795024053800001 0 ion O 6.990734880899999 10.3710839049 4.795024053800001 0 ion Ti 0.0 17.361818785799997 2.0 0 ion O 2.6502896331 20.0121084189 2.0 0 ion Ti 4.3404452478 21.7022640336 4.795024053800001 0 ion O 6.0306197598 23.3924385456 2.0 0 ion O 1.6901745119999998 24.3525536667 4.795024053800001 0 ion O 6.990734880899999 19.0519932978 4.795024053800001 0 ion Ti 0.0 26.0427281787 2.0 0 ion O 2.6502896331 28.6930178118 2.0 0 ion Ti 4.3404641451 30.3831734265 4.795024053800001 0 ion O 6.0306197598 32.07334793849999 2.0 0 ion O 1.6901745119999998 33.033463059599995 4.795024053800001 0 ion O 6.990734880899999 27.7329026907 4.795024053800001 0 ion Ti 8.680909392899999 0.0 2.0 0 ion O 11.331199026 2.6502896331 2.0 0 ion Ti 13.0213546407 4.3404452478 4.795024053800001 0 ion O 14.7115291527 6.0306197598 2.0 0 ion O 10.3710839049 6.990734880899999 4.795024053800001 0 ion O 15.6716442738 1.6901745119999998 4.795024053800001 0 ion Ti 8.680909392899999 8.680909392899999 2.0 0 ion O 11.331199026 11.331199026 2.0 0 ion Ti 13.0213546407 13.0213546407 4.795024053800001 0 ion O 14.7115291527 14.7115291527 2.0 0 ion O 10.3710839049 15.6716442738 4.795024053800001 0 ion O 15.6716442738 10.3710839049 4.795024053800001 0 ion Ti 8.680909392899999 17.361818785799997 2.0 0 ion O 11.331199026 20.0121084189 2.0 0 ion Ti 13.021373538 21.7022640336 4.795024053800001 0 ion O 14.7115291527 23.3924385456 2.0 0 ion O 10.3710839049 24.3525536667 4.795024053800001 0 ion O 15.6716442738 19.0519932978 4.795024053800001 0 ion Ti 8.680909392899999 26.0427281787 2.0 0 ion O 11.331199026 28.6930178118 2.0 0 ion Ti 13.021373538 30.3831734265 4.795024053800001 0 ion O 14.7115291527 32.07334793849999 2.0 0 ion O 10.3710839049 33.033463059599995 4.795024053800001 0 ion O 15.6716442738 27.7329026907 4.795024053800001 0 ion Ti 17.361818785799997 0.0 2.0 0 ion O 20.0121084189 2.6502896331 2.0 0 ion Ti 21.7022640336 4.3404452478 4.795024053800001 0 ion O 23.3924385456 6.0306197598 2.0 0 ion O 19.0519932978 6.990734880899999 4.795024053800001 0 ion O 24.3525536667 1.6901745119999998 4.795024053800001 0 ion Ti 17.361818785799997 8.680909392899999 2.0 0 ion O 20.0121084189 11.331199026 2.0 0 ion Ti 21.7022640336 13.0213546407 4.795024053800001 0 ion O 23.3924385456 14.7115291527 2.0 0 ion O 19.0519932978 15.6716442738 4.795024053800001 0 ion O 24.3525536667 10.3710839049 4.795024053800001 0 ion Ti 17.361818785799997 17.361818785799997 2.0 0 ion O 20.0121084189 20.0121084189 2.0 0 ion Ti 21.7022640336 21.7022640336 4.795024053800001 0 ion O 23.3924385456 23.3924385456 2.0 0 ion O 19.0519932978 24.3525536667 4.795024053800001 0 ion O 24.3525536667 19.0519932978 4.795024053800001 0 ion Ti 17.361818785799997 26.0427281787 2.0 0 ion O 20.0121084189 28.6930178118 2.0 0 ion Ti 21.7022640336 30.3831734265 4.795024053800001 0 ion O 23.3924385456 32.07334793849999 2.0 0 ion O 19.0519932978 33.033463059599995 4.795024053800001 0 ion O 24.3525536667 27.7329026907 4.795024053800001 0 ion Ti 26.0427281787 0.0 2.0 0 ion O 28.6930178118 2.6502896331 2.0 0 ion Ti 30.3831734265 4.3404452478 4.795024053800001 0 ion O 32.07334793849999 6.0306197598 2.0 0 ion O 27.7329026907 6.990734880899999 4.795024053800001 0 ion O 33.033463059599995 1.6901745119999998 4.795024053800001 0 ion Ti 26.0427281787 8.680909392899999 2.0 0 ion O 28.6930178118 11.331199026 2.0 0 ion Ti 30.3831734265 13.0213546407 4.795024053800001 0 ion O 32.07334793849999 14.7115291527 2.0 0 ion O 27.7329026907 15.6716442738 4.795024053800001 0 ion O 33.033463059599995 10.3710839049 4.795024053800001 0 ion Ti 26.0427281787 17.361818785799997 2.0 0 ion O 28.6930178118 20.0121084189 2.0 0 ion Ti 30.3831734265 21.7022640336 4.795024053800001 0 ion O 32.07334793849999 23.3924385456 2.0 0 ion O 27.7329026907 24.3525536667 4.795024053800001 0 ion O 33.033463059599995 19.0519932978 4.795024053800001 0 ion Ti 26.0427281787 26.0427281787 2.0 0 ion O 28.6930178118 28.6930178118 2.0 0 ion Ti 30.3831734265 30.3831734265 4.795024053800001 0 ion O 32.07334793849999 32.07334793849999 2.0 0 ion O 27.7329026907 33.033463059599995 4.795024053800001 0 ion O 33.033463059599995 27.7329026907 4.795024053800001 0 ion S 18.7159993038 19.344410118 11.6637579659 1 ion S 15.8433451416 16.2869403618 11.5507332146 1 ion Na 20.301558363 15.033180096 10.099156012399998 1 ion Na 14.453278650899998 20.540042289 9.6720581351 1

Lattice file

lattice Tetragonal 34.723563862 37.794599999999996

Attempted Debugging History

I have tried the following to try and debug this issue by doing the following:

Unusual and Other Things of Note

Please let me know if you require any more information and have any ideas on how to debug these computer crashes. Again, I am very grateful for your work on this software and am excited to keep using it. I hope there is a solution to these unfortunate crashes that kill weeks worth of computing time.

Thank you very much!

shankar1729 commented 1 month ago

Hi Andrew,

Thanks for the detailed tests you posted here! Firstly, these are very larger systems you are running for a single computer. In particular, I would recommend using WSL only for small tests / getting familiar with the code, and not for such large calculations.

I did a quick memory estimate for one of your systems (the one listed under Ubuntu), and it will need about 1.8 GB of memory per copy of wavefunctions, which will mean a working memory of 15 - 20 GB for SCF/minimize. So, if you are limiting the memory to below this level, that could be an issue. Likely, the WSL case ran the system out of memory.

For the speed, it is of course not realistic to run such calculations over weeks! For your system with a single k-point, running without MPI, as you did, is correct. You should still be using several cores using threads. JDFTx should do this automatically, but you can control it with -c or the SLURM_CPUS_PER_TASK environment variable. Check the system load using top when JDFTx is running, and make sure the jdftx process has much more than 100% (of one core) load showing; if not, that indicates a thread placement issue.

Also, for reference a calculation of the size you posted should take ~ 1 - 5 minutes / ionic step on an A100 GPU, and maybe a couple of hours on a good CPU node, definitely not weeks. The speed you are observing must be a combination of thread placement and perhaps non-optimal blas/fft libraries (what did you link to?).

Finally, do the remaining tests on the Ubuntu system; the WSL system has served it's purpose in getting you through the tutorials :).

Best, Shankar

AndrewTaehonKim commented 1 month ago

Hello Shankar,

Thank you very much for your swift reply! I greatly appreciate your input and guidance through this.

I completely understand what you are saying about the WSL2 system. I will be avoiding using that in the future. I also understand that the lack of memory may have caused the shutdown for the WSL2 system. However, I am still curious why the computer crashes occured on my pure Ubuntu 22.04 systems that have 32 and 64 GB of accessible memory. Do you have any guesses as to what may have caused the Ubuntu systems to crash irregularly and reboot? Again, the system in question had the following specs: OS Type: 64 bit. Memory: 64 GB Processor: AMD Ryzen Threadripper Pro 3955wx16-cores x 32 Graphics: NV164 Disk Capacity: 1 TB

Would you also be able to help me understand the expected computation time a bit more? I have checked the CPU usage using top for all my 7 computers that are running Ubuntu 22.04, and they are all using basically the entire CPU (400% for a 4 core CPU computer and 1600% for the aforementioned 16 core CPU that is crashing). You said that the calculation should take a couple hours per ionic step on a good CPU node. Since I am perfoming a geometry optimization, which has multiple ionic steps, is it still unusual that my calculations are taking more than 3 weeks to finish?

I also need to do perform geometry optimization calculations of this size in solvent using the LinearPCM. Do you have an estimate of how much memory that would require? I also need to test different adsorbents, so the Na2S molecule in the ionpos file will need to hopefully grow into Na2S6 or even Na2S8. How much would this increase the estimated memory requirement by? If possible, would you be able to share how you estimated the memory requirement for such calculations? I would like to also make estimations to make sure I run these calculations on computer systems with enough memory.

Lastly, I am considering purchasing a GPU like a 3090Ti for one of my computer systems in order to speed up these simulations. Would you advise this? Are there specific minimum GPU specifications that I should consider for systems of approximately this size and other things that might bottleneck the calculation speed?

Thank you so very much for your guidance! You have been very helpful, and JDFTx has been such a great tool for my research!

shankar1729 commented 1 month ago

I tested your NiS2@Na2S input file on a 16-core machine in our cluster, and it takes about 10 minutes / SCF cycle, as opposed to 30 minutes / SCF cycle in the snippet you posted above. That sounds close enough, but your Ryzen 3955 should be faster than my many-years older dual-Xeon E5-2620v4 by 2-3x.

Most likely, this could be due to a poorly performing BLAS library. Check ldd /path/to/jdftx to see which blas library you have linked to. If it is a libblas.so from the system directories, you may want to get the AMD AOCL math libraries and link to it instead.

Best, Shankar

AndrewTaehonKim commented 1 month ago

Dear Shankar,

Thank you very much for your input. After running ldd /path/to/jdftx/ I had the following output: linux-vdso.so.1 (0x00007ffdd6fcf000) libjdftx.so => /home/andrewkim/jdftx_install/build/jdftx_install/build/libjdftx.so (0x00007049f0400000) libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007049f0bad000) libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007049f02c9000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007049f0000000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007049f0b8d000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007049efc00000) libgsl.so.27 => /lib/x86_64-linux-gnu/libgsl.so.27 (0x00007049ef800000) libfftw3_threads.so.3 => /lib/x86_64-linux-gnu/libfftw3_threads.so.3 (0x00007049f0b81000) libfftw3.so.3 => /lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007049ef400000) libgslcblas.so.0 => /lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007049f0287000) liblapack.so.3 => /lib/x86_64-linux-gnu/liblapack.so.3 (0x00007049eec00000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007049eff19000) libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007049efe66000) libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007049efb43000) libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007049efae7000) /lib64/ld-linux-x86-64.so.2 (0x00007049f0c04000) libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007049ec7b0000) libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007049ec400000) libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007049f0252000) libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007049f0b78000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007049f0b5c000) libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007049efe3c000) libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007049ef7b8000)

I do not see a reference to libblas.so. Also, there are instructions for using MKL in the JDFTx documentation, but I could not find one for AMD's AOCL. I also checked the AMD AOCL user guide and was not sure how to use cmake to reference to AMD's BLAS packages.

Would you please guide me on how to implement the AOCL math library? Also, my other systems are running Intel® Xeon(R) CPU E3-1241 v3 @ 3.50GHz × 8 CPUs. Do you recommend I switch them to MKL as well?

Thank you very much!

shankar1729 commented 1 month ago

Yes, for the Intel machines, link to MKL.

You seem to be linked to OpenBLAS already, which is pretty good. (I forgot the recent Ubuntus swapped out the unoptimized blas for OpenBLAS.) Your issue may instead be that the open blas is laucnhing its own threads, conflicting with JDFTx's threading. Try setting the environment variable export OPENBLAS_NUM_THREADS=1 and see if it improves your performance. I'd recommend doing these tests first with a single ionic step on a smaller system, so that you can compare the various versions in a few minutes/hours before running anything for days.

If you want to try AOCL, you just need to add the lib path of AOCL to the cmake library search paths.

Best, Shankar

AndrewTaehonKim commented 1 month ago

Hello Shankar,

Thank you for always replying so quickly! I am quite new to all of this, so I am not sure where to set the environment variable. I tried to set the environment variable in the input file as such

include Na2S-openblas.lattice
include Na2S-openblas.ionpos

set OPENBLAS_NUM_THREADS 1

coords-type cartesian

ion-species GBRV/$ID_pbe.uspp

coulomb-interaction Isolated
coulomb-truncation-embed 7.562274270750001 10.181668708079998 26.45622

elec-cutoff 20 100
kpoint-folding 1 1 1 
electronic-SCF

ionic-minimize \
    nIterations 200 \
    dirUpdateScheme FletcherReeves \
    energyDiffThreshold 1e-6 \
    knormThreshold 1e-4  #Threshold on RMS cartesian force

dump-name Na2S-openblas.$VAR
dump End State

but that did not result in any changes. Did I set set the environment variable correctly? If not, please let me know how to set it.

Thank you!

shankar1729 commented 1 month ago

No, run export OPENBLAS_NUM_THREADS=1 in your bash shell before running jdftx, or in your job file for slurm etc. This is not being read by jdftx, but rather directly by libopenblas.so, so it needs to be set before jdftx starts.

AndrewTaehonKim commented 1 month ago

Thank you for this clarification.

I have run the following input file for a small molecule, Na2S,

# Ionic positions in cartesian coordinates:
ion S   7.562274270750001  10.181668708079998  26.456219999999998 1
ion Na   3.653322412229999  12.652545485250002  26.456219999999998 1
ion Na  11.469323171159999  12.655897866269999  26.456219999999998 1

and there was not much of a difference normal

SCF: Cycle:  0   Etot: -105.383921895121176   dEtot: +1.598e-01   |Residual|: 2.350e-01   |deigs|: 1.291e-02  t[s]:     34.23
SCF: Cycle:  1   Etot: -105.595317387871006   dEtot: -2.114e-01   |Residual|: 6.021e-02   |deigs|: 3.682e-02  t[s]:     37.81
SCF: Cycle:  2   Etot: -105.609035718083064   dEtot: -1.372e-02   |Residual|: 4.263e-02   |deigs|: 7.714e-03  t[s]:     41.03
SCF: Cycle:  3   Etot: -105.614269025497236   dEtot: -5.233e-03   |Residual|: 1.657e-02   |deigs|: 8.792e-03  t[s]:     44.42
SCF: Cycle:  4   Etot: -105.614070392822185   dEtot: +1.986e-04   |Residual|: 1.702e-02   |deigs|: 2.879e-03  t[s]:     47.87
SCF: Cycle:  5   Etot: -105.614817647032396   dEtot: -7.473e-04   |Residual|: 4.186e-03   |deigs|: 3.622e-03  t[s]:     51.44
SCF: Cycle:  6   Etot: -105.614848396399964   dEtot: -3.075e-05   |Residual|: 2.932e-03   |deigs|: 5.266e-04  t[s]:     54.91
SCF: Cycle:  7   Etot: -105.614903169999366   dEtot: -5.477e-05   |Residual|: 1.433e-03   |deigs|: 1.025e-03  t[s]:     58.38
SCF: Cycle:  8   Etot: -105.614911362182070   dEtot: -8.192e-06   |Residual|: 1.103e-03   |deigs|: 2.753e-04  t[s]:     61.42
SCF: Cycle:  9   Etot: -105.614922605888040   dEtot: -1.124e-05   |Residual|: 8.075e-04   |deigs|: 3.103e-04  t[s]:     64.93
...
Duration: 0-0:02:19.32

openblas = 1

Will mix electronic density at each iteration.
SCF: Cycle:  0   Etot: -105.383914215462681   dEtot: +1.598e-01   |Residual|: 2.350e-01   |deigs|: 1.291e-02  t[s]:     33.45
SCF: Cycle:  1   Etot: -105.595316649917947   dEtot: -2.114e-01   |Residual|: 6.021e-02   |deigs|: 3.682e-02  t[s]:     36.87
SCF: Cycle:  2   Etot: -105.609035496554611   dEtot: -1.372e-02   |Residual|: 4.263e-02   |deigs|: 7.714e-03  t[s]:     40.38
SCF: Cycle:  3   Etot: -105.614268998012165   dEtot: -5.234e-03   |Residual|: 1.657e-02   |deigs|: 8.793e-03  t[s]:     43.79
SCF: Cycle:  4   Etot: -105.614070350627387   dEtot: +1.986e-04   |Residual|: 1.702e-02   |deigs|: 2.880e-03  t[s]:     47.22
SCF: Cycle:  5   Etot: -105.614817651494988   dEtot: -7.473e-04   |Residual|: 4.186e-03   |deigs|: 3.622e-03  t[s]:     50.75
SCF: Cycle:  6   Etot: -105.614848400870997   dEtot: -3.075e-05   |Residual|: 2.932e-03   |deigs|: 5.266e-04  t[s]:     54.19
SCF: Cycle:  7   Etot: -105.614903170241234   dEtot: -5.477e-05   |Residual|: 1.433e-03   |deigs|: 1.025e-03  t[s]:     57.58
SCF: Cycle:  8   Etot: -105.614911362057626   dEtot: -8.192e-06   |Residual|: 1.103e-03   |deigs|: 2.752e-04  t[s]:     60.42
SCF: Cycle:  9   Etot: -105.614922605775220   dEtot: -1.124e-05   |Residual|: 8.075e-04   |deigs|: 3.102e-04  t[s]:     63.77
...
Duration: 0-0:02:16.35

both took ~ 3s per SCF cycle

so I also tried it on a larger but simple system, graphene

# Ionic positions in cartesian coordinates:
ion C   28.01416292591644   0.00445338467469    0.4887003362507194  1
ion C   16.346425888740537  20.217875004139895  0.494441368511783   1
ion C   2.341491267388133   4.043706087769919   0.4942886500924679  1
ion C   23.34667648664185   0.004384458899791   0.4839013181266907  1
ion C   4.676245474304146   0.003825684533332   0.48713891651107133 1
ion C   11.67766870486726   20.219126759791724  0.483004299183591   1
ion C   25.67881105737885   4.044276294935732   0.49940867570530045 1
ion C   18.68048828431788   16.174907144025568  0.5118212981591874  1
ion C   4.67749213595881    8.082325094833877   0.4804342182734551  1
ion C   18.6793823896464    0.004419539788082   0.48342645537335827 1
ion C   9.343823638707176   0.003953947274407   0.48504785286126406 1
ion C   9.343132606087329   16.18003527330572   0.46223880862992317 1
ion C   23.345124811792477  8.086659640014878   0.5161105587543915  1
ion C   21.01310312038065   12.130537488137511  0.5222287367877918  1
ion C   7.01039656348227    12.130147336608232  0.4486696311442877  1
ion C   14.011721669833374  0.004487328277764   0.48699297674919606 1
ion C   14.013558296018106  16.17957442614987   0.49875782610164876 1
ion C   7.009818515150769   4.042499155409574   0.48384448349908027 1
ion C   21.01325800501044   4.045063281439558   0.499911963395391   1
ion C   16.350423015839485  12.13059917060026   0.5180351790283204  1
ion C   9.34228160229487    8.0818133681714 0.4755461128104308  1
ion C   16.34706235159582   4.043993563439029   0.49981734877733075 1
ion C   11.678225160842873  4.043864528802732   0.49125788887700494 1
ion C   11.67877257983628   12.1326565349028    0.4799438358607233  1
ion C   18.681222676379655  8.08616975301782    0.5173642518167796  1
ion C   7.009916898374378   20.21934386647767   0.47257906132880834 1
ion C   -6.992609659547584  20.216547172333016  0.49639938579395704 1
ion C   0.003266530903848   8.08351610749418    0.5122781746752096  1
ion C   14.0159558859541    8.083119896903865   0.5082239502304269  1
ion C   2.342093071666019   20.219019570074146  0.4803261462456945  1
ion C   -4.659880863172909  16.17456075714986   0.5103003200216811  1
ion C   2.33918658893772    12.132995157544904  0.4892992333638091  1
ion C   -2.329954496254302  12.131009916067642  0.5165899490796555  1
ion C   -2.326304055174534  20.217696998809753  0.49150403387388053 1
ion C   4.676726791076918   16.18084154081919   0.4661682642748559  1
ion C   0.006137637165399   16.17870367362059   0.4956596397275632  1
ion C   14.01234870964457   21.564438777735628  0.48960488411664116 1
ion C   2.342824768057115   1.350062948308334   0.48994159884327715 1
ion C   25.68051651973795   1.350702761820675   0.48890306980486287 1
ion C   16.348003904104942  17.524448264432507  0.5026780716257058  1
ion C   4.676210764116784   5.387823892778022   0.48726496815582365 1
ion C   21.013008310099927  1.350663380772684   0.4877047328091031  1
ion C   7.009950811972321   1.34925922541434    0.4842230025150407  1
ion C   11.67776264095304   17.525450596518706  0.4803628718510069  1
ion C   23.34632489423952   5.39223168016262    0.5051025746663278  1
ion C   18.681997813576384  13.480334719088798  0.5194839500821296  1
ion C   7.010361707405791   9.432882251512828   0.4589020664933088  1
ion C   16.345682566415046  1.350653197918856   0.488395943382983   1
ion C   11.67734310094677   1.350076750284254   0.48785342116083186 1
ion C   9.341532935239169   13.484022034967618  0.45604241662272926 1
ion C   21.013053215586027  9.43574275113414    0.5218525472738911  1
ion C   9.343921412688337   21.56496766188982   0.4777725350127149  1
ion C   -9.32553305058007   21.56371929426876   0.49099773971111205 1
ion C   0.006621766066881   5.389878389931321   0.5040408266753431  1
ion C   14.017461384638809  13.48349651525202   0.5073866496118313  1
ion C   9.343663251770447   5.388443811910577   0.48436130539376876 1
ion C   18.680116870408927  5.391612106124537   0.506293009054609   1
ion C   16.34930708655255   9.434553397568571   0.5162604154207564  1
ion C   11.678730991940606  9.429609444088063   0.49214234859062245 1
ion C   14.01322104108766   5.389351834858124   0.5018192673391333  1
ion C   4.676087364948579   21.564740743826302  0.47584154318974825 1
ion C   -6.992581671971104  17.522334606507364  0.5089818060645435  1
ion C   2.340099012219734   9.430388049872771   0.5003156677125276  1
ion C   -2.329065275086099  9.43506446429571    0.5160275969672803  1
ion C   -4.659725635779632  21.56378191296221   0.48933115704013375 1
ion C   7.009941513485121   17.526875139515802  0.4636818919046881  1
ion C   0.007471279885106   21.564584663929224  0.4868052205249356  1
ion C   -4.661048587855587  13.48034529581544   0.5177056404843583  1
ion C   4.678420581440577   13.483969691801144  0.4611911118779588  1
ion C   2.341877433067657   17.525550844977925  0.47981658154812123 1
ion C   -2.327622151484986  17.52385708460419   0.4990385276012752  1
ion C   0.001623704991355   13.483660458022491  0.506902314285373   1

normal

ElecMinimize: Iter:   0  F: -406.821628450990261  |grad|_K:  4.268e-04  alpha:  1.000e+00
    FillingsUpdate:  mu: -0.165374825  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 1
ElecMinimize: Iter:   1  F: -410.893588595434551  |grad|_K:  1.039e-04  alpha:  7.164e-01  linmin: -1.416e-04  t[s]:    477.95
    FillingsUpdate:  mu: -0.161054701  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 1.02
ElecMinimize: Iter:   2  F: -411.153343050156082  |grad|_K:  4.026e-05  alpha:  7.593e-01  linmin: -7.475e-06  t[s]:    605.76
    FillingsUpdate:  mu: -0.159192881  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 1.01
ElecMinimize: Iter:   3  F: -411.199348883130028  |grad|_K:  1.786e-05  alpha:  8.943e-01  linmin:  6.456e-06  t[s]:    731.45
    FillingsUpdate:  mu: -0.157426933  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 0.672
ElecMinimize: Iter:   4  F: -411.208970913284077  |grad|_K:  1.363e-05  alpha:  9.570e-01  linmin:  2.770e-05  t[s]:    857.42
    FillingsUpdate:  mu: -0.157065106  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 0.362
ElecMinimize: Iter:   5  F: -411.210568540733959  |grad|_K:  5.751e-06  alpha:  2.696e-01  linmin: -2.838e-05  t[s]:    982.75
    FillingsUpdate:  mu: -0.156823779  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 0.432
...
Duration: 0-0:44:36.71

openblas = 1

ElecMinimize: Iter:   0  F: -406.821628450924607  |grad|_K:  4.268e-04  alpha:  1.000e+00
    FillingsUpdate:  mu: -0.165374821  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 1
ElecMinimize: Iter:   1  F: -410.893588578729180  |grad|_K:  1.039e-04  alpha:  7.164e-01  linmin: -1.416e-04  t[s]:    469.94
    FillingsUpdate:  mu: -0.161054703  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 1.02
ElecMinimize: Iter:   2  F: -411.153343056237020  |grad|_K:  4.026e-05  alpha:  7.593e-01  linmin: -7.475e-06  t[s]:    594.90
    FillingsUpdate:  mu: -0.159192876  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 1.01
ElecMinimize: Iter:   3  F: -411.199348960236307  |grad|_K:  1.786e-05  alpha:  8.943e-01  linmin:  6.456e-06  t[s]:    719.95
    FillingsUpdate:  mu: -0.157426692  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 0.673
ElecMinimize: Iter:   4  F: -411.208972087260292  |grad|_K:  1.363e-05  alpha:  9.572e-01  linmin:  2.768e-05  t[s]:    845.52
    FillingsUpdate:  mu: -0.157065088  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 0.362
ElecMinimize: Iter:   5  F: -411.210568576992614  |grad|_K:  5.752e-06  alpha:  2.696e-01  linmin: -2.836e-05  t[s]:    970.98
    FillingsUpdate:  mu: -0.156823757  nElectrons: 288.000000
    SubspaceRotationAdjust: set factor to 0.432
...
Duration: 0-0:44:25.15

both took ~ 125 s per SCF cycle

Overall, I do not see a significant difference in export OPENBLAS_NUM_THREADS=1

Do you have any other suggestions?

Also, I switched one of my intel systems from openblas to mkl following the instructions in the JDFTx documentation. I did not see a significant difference (10s faster total) in the computational time for Na2S. Is that to be expected?

Thank you so very much for your help!

shankar1729 commented 1 month ago

Sounds good: in this case, it looks like the blas library is already configured to launch threads, and that openblas is not too far from MKL for performance on your machines. Unfortunately, that probably means you are already at, or close to, limiting performance for your machines.

Best, Shankar

AndrewTaehonKim commented 1 month ago

I see. Thank you.

I have been trying to switch to using AMD's AOCL but am struggling to set that up. I assume I have modified the CMAKE_List.txt file to have set(CMAKE_LIBRARY_PATH /opt/AMD/aocl/aocl-linux-aocc-4.2.0/aocc ${CMAKE_LIBRARY_PATH}) and compiled using cmake, but the running ldd path/to/jdftx showed no changes. I apologize that I am not familiar with cmake and makefile compilations. Would you be able to guide me through this?

shankar1729 commented 1 month ago

You don't need to modify the CMakeLIsts.txt; you can add a -D CMAKE_LIBRARY_PATH=/opt/... on the cmake command line. This will ensure that it gets set before all the libraries are searched.

Next, make sure you cleaned out the build directory, or at least deleted CMakeCache.txt, otherwise your old build settings may still be taking effect.

AndrewTaehonKim commented 1 month ago

Apologies, I acceidentally closed this issue. I have reopened it. I have tried D CMAKE_LIBRARY_PATH=/opt/... to where my BLIS installation is, but nothing changes during the make process. I am not sure what I am doing wrong. Thank you.

shankar1729 commented 1 month ago

Looking back above, I think the path is incorrect: it should not be to aocc, but to the lib directory that contains various lib*.so files. So maybe something like /opt/AMD/aocl/lib (check the contents)?

shankar1729 commented 1 month ago

And if that doesn't work, you can also explicitly set CBLAS_LIBRARY and LAPACK_LIBRARY to the path to the libblis.so.

AndrewTaehonKim commented 1 month ago

Thank you very much. You are correct in that I should have connected the PATH to /lib. You are so kind and helpful!

If possible, would you be able to help answer a couple more questions?

I need to do perform geometry optimization calculations of this size in solvent using the LinearPCM. Do you have an estimate of how much memory that would require?

I also need to test different adsorbents, so the Na2S molecule in the ionpos file will need to hopefully grow into Na2S6 or even Na2S8. How much would this increase the estimated memory requirement by?

If possible, would you be able to share how you estimated the memory requirement for such calculations? I would like to also make estimations to make sure I run these calculations on computer systems with enough memory.

Lastly, I am considering purchasing a GPU like a 3090Ti for one of my computer systems in order to speed up these simulations. Would you advise this? I am not sure if there would be enough memory on the GPU to perform calculations of this size. Are there specific minimum GPU specifications that I should consider for systems of approximately this size and other things that might bottleneck the calculation speed?

Thank you so very much for your guidance! And thank you for creating such a useful and detailed program!

shankar1729 commented 1 month ago

Solvation shouldn't increase the memory requirement or time much, once you are at this system size, where you are dominated by the quantum part.

Do a dry run (jdftx -ni <inputfile>) for your proposed calculations and look at the line containing nStates, nBands and the one containing nBasis. The total memory required for one copy of the wavefunctions is nStates * nBands * nBasis * 16 bytes (the 16 is for one complex number at double precision). You may need around 5 copies of the wavefunctions during a run, so use nStates * nBands * nBasis * 80 bytes as an estimate of total memory.

If you have multiple nodes / GPUs, you can split the nStates over them using MPI, then divide accordingly. But take note of the discreteness of nStates. So if your nStates = 1, then you can't split it over machines in JDFTx's k-point parallelization scheme.

Finally, a 3090Ti has 24 GB, which can be decently useful. However the gaming GPUs have very low double-precision FLOPs compared to their server counterparts (which are an order of magnitude more expensive unfortunately). You may still benefit from the improved memory bandwidth compared to the CPUs, but it's not going to be as dramatic as the GPUs that are installed on supercomputers (like A100, H100 etc.). See this page for detailed comparison specs, and look for the memory bandwidth and FLOPs columns.

AndrewTaehonKim commented 1 month ago

Thank you very much. This was very helpful!