Cannot match semi-direct CCSD convergence in TCE CCSD

glowthrower commented 3 years ago

Describe the bug

I'm running a CCSD calculation with a large basis set (cc-pCV6Z) on a water molecule. With some non-default settings, I can get the calculation to converge, albeit slowly, using the semi-direct CCSD module. However, try as I might, I cannot achieve convergence using the TCE module. I'm wondering if this is a) me doing something wrong, b) an issue with the TCE code (not converging when it should), or c) an issue with the semi-direct code (converging when it shouldn't, i.e. reporting an unreliable result).

Describe settings used

I'm using NWChem 7.0.2, built against Intel Parallel Studio 2020 Update 2 (so with MKL and Intel MPI), as I've mentioned in some previous bug reports. I have slightly modified the code from the source tarball to take advantage of some hot-fixes from those bugs (thanks again!) but to the best of my knowledge, these changes should not touch either the TCE or CCSD modules.

Attach log files

Here is a log of a successfully-converging run with the semidirect CCSD module: semidirect.log

Here is a log of a non-converging run with TCE. The log ends abruptly about 1350 iterations in, after the batch job it was running in timed out. tce.log

To Reproduce

Execute the following input script, noting that the cc-pcv6z basis set is only available in the "old" libraries directory (not the new libraries.bse directory).

echo                                                                                                    
start h2o_cc_pcv6z                                                                                      

permanent_dir <path_to_permanent>
scratch_dir <path_to_scratch>                                                                    

memory total 5 gb global 4 gb                                                                           

geometry units angstrom noautoz noautosym                                                               
 O   0.0000000000   0.0000000000   0.0000000000                                                         
 H   0.7569520600   0.0000000000   0.5858836200                                                         
 H  -0.7569520600   0.0000000000   0.5858836200                                                         
end                                                                                                     

basis spherical                                                                                         
  H library cc-pv6z                                                                                     
  O library cc-pcv6z                                                                                    
end                                                                                                     

set lindep:n_dep 0                                                                                      

scf                                                                                                     
  thresh 1.000000e-08                                                                                   
  maxiter 50                                                                                            
  semidirect filesize 0                                                                                 
  tol2e 1.0e-12                                                                                         
end                                                                                                     

ccsd                                                                                                    
  thresh 1.0e-08                                                                                        
  maxiter 10000                                                                                         
  diisbas 20                                                                                            
end                                                                                                     
task ccsd energy                                                                                        

tce                                                                                                     
  ccsd                                                                                                  
  thresh 1.0e-08                                                                                        
  maxiter 10000                                                                                         
  diis 20                                                                                               
  2eorb                                                                                                 
  2emet 15                                                                                              
  idiskx 0                                                                                              
end                                                                                                     
task tce energy

Expected behavior

In a perfect world, I would hope that if a calculation can be coerced into convergence with one of the CCSD pathways available in NWChem, then this should also be possible with the other CCSD pathway. I am aware that there are a number of reasons why this may not be the case in practice... :wink:

Additional context

1) The underlying issue is almost certainly that some of the basis functions are close to being linearly dependent. If I remove the lindep:n_dep 0 line and allow NWChem to throw away some of the MOs, the calculation converges trivially in both semi-direct CCSD and in TCE. Unfortunately, the quality of the result thus obtained isn't good enough for the intended purpose (reproducing results from a paper to high accuracy). Even with set lindep:n_dep 1, the resulting change in SCF energy is too great to be acceptable. (Perhaps there are alternative, more sophisticated ways to discard specific bad functions rather than letting NWChem pick..?)

2) I suspect that the difference in behaviour here is related to the particular DIIS settings, and the implementations of DIIS used by the modules. The semi-direct CCSD does not converge with the default diisbas 5, hence the rather aggressive use of diisbas 20. However, the same setting for TCE doesn't produce convergence in the residuum, as seen in the output log -- although the correlation energy seems very stable and matches the output from the converged semi-direct calculation.

3) From looking into the code, it seems that the semi-direct solver performs a DIIS extrapolation at every iteration, using min(iter, diisbas) vectors, while TCE only does an extrapolation every n iterations, each time using diis previous vectors. Is there any way to achieve the former in TCE? (Looking at the relevant source, I guess probably not.)

4) As a comment, this calculation converges straightforwardly using the default settings in PySCF, which uses (I think...) an alternative variant of DIIS and a default setting of considering 6 iterations for DIIS. The result agrees with the semi-direct result here, to a cursory check.

Any comments would be welcome. @jeffhammond , you mentioned in another issue that you "have a number tricks for the absurdly hard cases" -- anything you can think of that might help here? (Or were you referring only to convergence of the original SCF?)

jeffhammond commented 3 years ago

@glowthrower Most of the tricks I have related to SCF convergence, because NWChem has a lot of knobs there, at least ones that help a lot in common cases.

I had a lot of TCE linear response convergence issues back in the day, but they were more related to CCSDTQ than 6Z, and I built some things to solve those faster. When TCE didn't converge with a huge basis set - benzene CCSD/aug-cc-pV5Z was a memorable one - I just gave up and published the next best thing.

Someone could implement the DIIS algorithm you mention but I don't know when that might happen. DOE has decided to put most of the funding into NWChemEx, so while NWChem is still actively maintained, there is less feature development than there was in the past, particularly in TCE.

edoapra commented 3 years ago

@glowthrower the root cause of your CCSD convergence problem is the numerical instability of the integrals. I have stumbled into plenty of examples similar to the one reported here when use of large basis sets and setting lindep:n_dep 0 causes the CCSD to diverge. The solution is to switch from the default integral package NWChem uses (Texas) either to the McMurchie-Davidson integrals or the Simint integrals. While McMurchie-Davidson integrals are built by default, Simint integrals require some specific settings and require a longer compilation time https://github.com/nwchemgit/nwchem/tree/master/src/NWints/simint. However, Simint integrals are much faster than McMurchie-Davidson integrals. I am attaching input and output files for these combinations. As you can seen, convergence is not so slow. noconv_mcm.nw.txt noconv_mcm.log noconv_simint.nw.txt noconv_simint.log

glowthrower commented 3 years ago

@jeffhammond Thanks for the info. BTW, the suggestion to try basis set projection that you made in the other thread helped me resolve an otherwise intractable SCF convergence issue in a different problem, so thanks also for that. :) I'd always considered basis set projection as a technique requiring too much attention to detail for my everyday use -- I do a lot of high-throughput calculations and can't manually check the logs for each of them -- but I found to my surprise that even projecting from an STO-2G solution(!) to an aug-cc-pCV5Z initial guess was enough to get something to converge that otherwise diverged catastrophically from the first iteration. So perhaps it's more forgiving than I thought and could be used in a high-throughput setting. Neat trick!

On a side note, I only just read about NWChemEx the other day, and it wasn't clear to me if there was anything approaching a beta version available to the public yet. Searching reveals a lot of marketing releases but not much else -- the github repo has a number of subprojects but nothing that looks like the main driver, so to speak. Do you maybe have any insider info?

@edoapra Wow, thank you very much for that! I'm particularly interested to see that my original result with the SD code was indeed wrong. The correlation energy disagrees to about 1.0e-06 with the results obtained with the alternative integral libraries, despite "converging" below a threshold of 1.0e-08. That's a bit disconcerting but still very good to know -- I would have extrapolated from and used that result otherwise... :)

Re: SIMINT. Getting that to compile took some work. Even though it seems that the library supports up to j-functions, I got build errors with SIMINT_MAXAM=7 and could only successfully build with SIMINT_MAXAM=6. Also, the run we've both tested here crashes during Schwarz screening unless each process has at least 1.4GB of local memory, which seems excessive given that there's only about 23 million distinct shell quartets. Please let me know if either of those isn't expected behaviour, and I'll file separate bug report(s).

But since it seems to improve both performance and stability, can you think of any reason not to use SIMINT as a default in all my calculations from now on? Maybe a better way of phrasing that question would be: are you aware of any cases when SIMINT produces incorrect results? From reading the relevant paper by the author(s) of the library, it seems to be formulated in terms of Cartesian functions rather than spherical harmonic functions -- is this a potential issue? (I have no idea if TEXAS handles spherical harmonic functions explicitly, or if it also works in Cartesians under the bonnet.)

Also, it seems like the actual integral code in SIMINT is generated and compiled as C, which suggests it might be worth using CC=icc and CXX=icpc to build everything rather than gcc and g++ respectively, despite the general recommendations of the documentation. Any feelings about whether this is a good idea and worth it?

jeffhammond commented 3 years ago

Yeah, in an ideal world, basis set projection should be automated and the default. In MPQC, they make it trivial with a guess basis option. We could wire it up in NWChem but it would be tricky since we'd need a basis set that supports all atoms to be a universal guess, and we'd have to add the logic for pseudopotentials as well. There was a time in my life when I had both the desire and the time to work on this, but those days have passed 😞

As for NWChemEx, I don't think they have made a public release. While I know more than many about the project, I am not an insider, and can't say what they plan to release or when.

glowthrower commented 3 years ago

Fair enough, thanks anyway. :smiley:

edoapra commented 3 years ago

Re: SIMINT. Getting that to compile took some work. Even though it seems that the library supports up to j-functions, I got build errors with SIMINT_MAXAM=7 and could only successfully build with SIMINT_MAXAM=6.

Thanks for reporting this. SIMINT_MAXAM=6 is likely to be the highest angular moment I have tried. I might know how to fix the SIMINT_MAXAM=7 problem.

Also, the run we've both tested here crashes during Schwarz screening unless each process has at least 1.4GB of local memory, which seems excessive given that there's only about 23 million distinct shell quartets. Please let me know if either of those isn't expected behaviour, and I'll file separate bug report(s).

Yes, I have seen myself that we could be extra generous in the Simint memory requirements. Could you please file a bug report with input and output file of the reproducer?

But since it seems to improve both performance and stability, can you think of any reason not to use SIMINT as a default in all my calculations from now on? Maybe a better way of phrasing that question would be: are you aware of any cases when SIMINT produces incorrect results? From reading the relevant paper by the author(s) of the library, it seems to be formulated in terms of Cartesian functions rather than spherical harmonic functions -- is this a potential issue? (I have no idea if TEXAS handles spherical harmonic functions explicitly, or if it also works in Cartesians under the bonnet.)

Both Simint and Texas do use Cartesians. NWChem does the transformation for cartesian to sphericals. I am not aware of SIMINT producing incorrect results ... quite the opposite! As I have already mentioned in this issue, I have already several cases when SImint is more numerically stable then Texas. In the case of correlated methods using the (aug)-c-pvXz basis sets, I would recommend using Simint.

Also, it seems like the actual integral code in SIMINT is generated and compiled as C, which suggests it might be worth using CC=icc and CXX=icpc to build everything rather than gcc and g++ respectively, despite the general recommendations of the documentation. Any feelings about whether this is a good idea and worth it?

Switching from gcc to icc should not compromise the numerical correctness of results (I have just realized I have used icc on some of my largest and most extensive Simint runs). Not quite sure about the impact on performance. Since the vectorized code is written is assembler, the compiler itself might not play a big role.

glowthrower commented 3 years ago

Thanks for reporting this. SIMINT_MAXAM=6 is likely to be the highest angular moment I have tried. I might know how to fix the SIMINT_MAXAM=7 problem.

That would be nice! I guess that j-functions are a bit of a specialised case for most uses, though. I just tried SIMINT_MAXAM=7 because I'm the kind of person who defaults to turning these kind of settings up to 11, as it were. :wink: Certainly, the code generation step took a very long time.

Yes, I have seen myself that we could be extra generous in the Simint memory requirements. Could you please file a bug report with input and output file of the reproducer?

Done, see here.

I am not aware of SIMINT producing incorrect results ... quite the opposite! As I have already mentioned in this issue, I have already several cases when SImint is more numerically stable then Texas.

Sure, clearly it helps here. I meant more if there were known failure cases in different settings to this one. But if not, that's awesome, I'll very happily take some extra performance and extra stability!

Since the vectorized code is written is assembler, the compiler itself might not play a big role.

Hmm, I think it might. The Simint generated code uses intrinsics rather than raw assembler, and there's still quite a lot that the compiler can do with those (reordering, loop transformations, etc.). But anyway, I've seen no problems with code compiled with icc so far, so I think I will just keep using it. :smiley:

glowthrower commented 3 years ago

Anyway, since the original issue has been very thoroughly resolved, I'll close this. Thanks again!

edoapra commented 3 years ago

Thanks for reporting this. SIMINT_MAXAM=6 is likely to be the highest angular moment I have tried. I might know how to fix the SIMINT_MAXAM=7 problem.

That would be nice! I guess that j-functions are a bit of a specialised case for most uses, though. I just tried SIMINT_MAXAM=7 because I'm the kind of person who defaults to turning these kind of settings up to 11, as it were. 😉 Certainly, the code generation step took a very long time.

The latest commits to https://github.com/edoapra/simint-generator/ fixes the SIMINT_MAXAM=7 and gets you a successful build. However, the next roadblock is the fact that NWChem itself does not support k functions (and a quick search for k functions in gaussian basis sets returned no hits for me)

glowthrower commented 3 years ago

Great, thanks!

It's been a long day/week and I might be making an off-by-one error, but wouldn't SIMINT_MAXAM=7 generate up to j-functions, not k-functions? Certainly I didn't see any quartets involving k falling out of the build generator yesterday...

edoapra commented 3 years ago

j-functions are not used, that's why Simint was crashing since some parts of it were assuming j orbitals. L=7 implies k-functions, instead https://en.wikipedia.org/wiki/Spectroscopic_notation#Atomic_and_molecular_orbitals

glowthrower commented 3 years ago

Ah. I forgot about that. That explains it then! Cheers. :wink:

nwchemgit / nwchem

Cannot match semi-direct CCSD convergence in TCE CCSD #367