jjgoings / McMurchie-Davidson

do a simple closed shell Hartree-Fock using McMurchie-Davidson to compute integrals
BSD 3-Clause "New" or "Revised" License
78 stars 17 forks source link

Combinations #18

Closed pwborthwick closed 3 years ago

pwborthwick commented 3 years ago

Hi Josh, just a small typo...

    if comb(nEle,nOrb) > 5000:
        print("Number determinants: ",comb(nEle,nOrb))
        sys.exit("FCI too expensive. Quitting.")

This is the wrong way round (eg for water 'taking 10 things 14 at a time'), it will always return 0 so 'if' is always False. You've got it the right way round a couple of lines down! Hope all well.

jjgoings commented 3 years ago

Thanks for finding this! Fixed in 97ff2f69

pwborthwick commented 3 years ago

Just one other point. In the paper on using bit-strings with FCI they, because its Fortran are in trouble if the number of spin orbitals exceeds the 64 bit word size. Hence all the Nint stuff. I'm pretty new to python but do you really need it in mmd?

    h = '0b1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111'
    print('Decimal representation...', int(h,2))
    Decimal representation... 1267650600228229401496703205375

That's a string of 100 '1's and python seems to take it in it's stride. I'm missing something aren't I? Best wishes Peter

jjgoings commented 3 years ago

@pwborthwick You are absolutely right, Python can handle arbitrary precision integers. So in this respect, we don't need Nint in mmd at all. A really interesting read can be found here. One big takeaway is that it gets a little weird using numpy, so you do have to be careful sometimes. To extend your example:

import numpy as np

h = np.array([1267650600228229401496703205375])
print("h: ",h,"; type: ",h.dtype) # object

i = np.array([2**63 -1])
print("i: ",i,"; type: ",np.array([i]).dtype) # int64

print("h + i: ",h + i,"; type: ",(h + i).dtype) # object
print("h + 1: ",h + 1) # no overflow ... object
print("i + 1: ",i + 1) # overflow! ... int64

yields

h:  [1267650600228229401496703205375] ; type:  object
i:  [9223372036854775807] ; type:  int64
h + i:  [1267650600237452773533557981182] ; type:  object
h + 1:  [1267650600228229401496703205376]
i + 1:  [-9223372036854775808]

So if we aren't careful we could get an overflow. Numpy tries to be intelligent with the data types, but it can't anticipate all our use cases. If it represents integers as object, there is no issue, but if it uses int64 then we have a problem because then it is relying on C-style ints. Now that you brought this to my attention, it might be a good idea to ensure that we are using unsigned 64-bit integers here, which I didn't think about too carefully. (The saving grace is that we should never encounter a case where we have more than 64 orbitals, since the code is too slow -- famous last words.)

OK, so why do I include Nint? The reason is that I am using mmd to quickly prototype C++ code, and C++ can't handle arbitrary precision integers. I know you already know this, but just to make the point:

int main() {
    auto h = 1267650600228229401496703205375;
    return 0;
}

won't compile, and dies with the compilation error

test.cpp:2:14: error: integer literal is too large to be represented in any
      integer type
    auto h = 1267650600228229401496703205375;
             ^
1 warning and 1 error generated.

So anticipating C++ needs within Python is more a useful functionality for me, as I develop in a different code base. Iron out the C++ (-ish) code first in Python. That, and I hate waiting to compile, so I find writing prototype code in Python first, and then translating to C++ is more efficient. It helps me think a bit more abstractly, as well as a bit more ruthless in tracking down bugs.

pwborthwick commented 3 years ago

Hi @jjgoings , Thanks for the Python examples. It did cross my mind that you may be targeting a more general code-base than just Python. I originally coded the FCI etc using text bit strings because, as you say, these programs are for molecules where it wouldn't make much of a difference in speed. I've only just come back to this following a diversion after reading your SCI paper. I refreshed my memory on Epstein-Nesbet and while reading about PT I remembered a paper I saw on generating Hugenholtz diagrams in code...so I wrote a Python program that generates the diagrams and then automatically generates code (and executes it) to evaluate the diagrams. Anyway bearing your comments about text bit strings being too slow for generating SCI determinants on-the-fly I was going to implement a bit manipulation version which is why I queried the 64 bit thing. I will still go ahead with a bit version of my code (but with care re int64 and objects) just to get some practice on using bits. I must get back to biorthonormalizing eigenvectors at some point. I think even elementary row/column operations destroy the eigensolution nature of the matrix so you have to biorthonormalize using some iterative process so any method along the lines of your blog won't work. I've got transition dipoles, both electric and magnetic, in both length and velocity gauges and oscillator/rotary strengths working in the Tamm-Dancoff Approximation as a start. Thanks again for you explanations. Best wishes Peter