quantumlib / OpenFermion-PySCF

OpenFermion plugin to interface with the electronic structure package PySCF.
Apache License 2.0
105 stars 44 forks source link

Update pyscf wrapper #27

Closed sunqm closed 6 years ago

sunqm commented 6 years ago
googlebot commented 6 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers
sunqm commented 6 years ago

I signed it!

googlebot commented 6 years ago

CLAs look good, thanks!

babbush commented 6 years ago

Thanks for a great pull request Qiming. You are expanding the functionality of OpenFermion-PySCF in a meaningful (and badly needed way). It seems that you are introducing a new MolecularData class that inherits from the OpenFermion MolecularData class, and you are adding certain methods and attributes to the class which can be used to store and load pyscf data.

I think a discussion about whether this is preferred design is in order (often we encourage discussion of such things in "issues" before opening a PR that makes such design choices - but it's perfectly fine to discuss on the PR) . One option would have been to add the ._pyscf_data attribute to the core class in OpenFermion. Then, you could write functions in the OpenFermion-PySCF module (as opposed to methods on the class) which take the core MolecularData as input and updates the attributes you are updating in all the methods decorated with @property in this PR. This would avoid the messiness of having two slightly different MolecularData classes floating around. For me, having two implementations of the same class feels dangerous and should be avoided. This is consistent with the "single-responsibility" principle which tries to avoid making any class overly complicated.

I would also be interested in hearing input from you @sunqm and also from others in OpenFermion such as @jarrodmcc @idk3 @ncrubin and @kevinsung . Perhaps there is an argument for overloading the class via inheritance. It does show off some sophisticated python design patterns ;-)

sunqm commented 6 years ago

I think the goal here in OpenFermionPySCF library would be to add a layer to access pyscf program. By extending the existing methods/functions/class of the core OpenFermion class, users should be able to run the existing OpenFermion code without changing anything, (maybe except a few import statement from OpenFermionPySCF library). The question of design is whether we want to make OpenFermionPySCF extension almost invisible to users in their code. One solution is to dump out the density matrices, wave functions etc. to some files as the OpenFermion core class did, and still use the existing core class (without implementing any extended class in OpenFermion-Pyscf code) . This just requires adding some lines in _run_pyscf function that computes the density matrices and dump to the required files. But I don't think this is a good solution since pyscf has the functionality to compute these quantities as needed on the fly. The name of the class is not an issue for me. I can change it to any names that make code clear.

._pyscf_data is not a beautiful solution. Before figuring out a better solution of this layer, I plan to use it for now in the extended MolecularData class. Adding ._pyscf_data in OpenFermion for sure simplifies many things for me. But I would say it does not sound like a good design. The magic variable ._pyscf_data in OpenFermion core class looks like an unnecessary couple to an external library. If you guys are ok with this magic variable in the core module, I would be happy to use that.

babbush commented 6 years ago

I'd still like to hear a couple other opinions on this design. I understand why you'd like to have a special class for the PySCF data. I see arguments for and against this. But one thing that I expect to be clear is that we should very likely change the class name if we do implement it in the fashion you're proposing. Otherwise, I worry about strange conflicts with the routines that save and load these classes.

ncrubin commented 6 years ago

I think @sunqm's solution is the typical way of defining an interface to a python object. Of course, if we really wanted to force a differentiation we could add a bunch of abstract base class requirements to MolecularData, but I think that is overkill. I am against adding properties to MolecularData with hidden dependencies to external libraries. Though the super user might be aware of the tight coupling, a new user would find this out the hard way.

Also, to @sunqm point, if a user is committed to using this object they are acknowledging they are using an interface to pyscf. I think this is a reasonable thing to think. I do agree with @babbush about changing the name of this object to something more descriptive and less prone to naming collisions. PySCFMolecularData?

jarrodmcc commented 6 years ago

I like this pull request a lot, as I think it gets closer to something that is an ideal, which is a data structure that just automatically computes in the background when you ask for a property, or pulls it from a database / other stored data object if it is already computing. I think it's good to structure it in this way, as it sort of makes it clear that the data is being treated in a fundamentally different way to developers and users. My two thoughts on this, is that if we keep it the way it's structured now, I agree with Nick and Ryan that we should change the name slightly to make it clear it is specialized for committing to PySCF. The other thought is that if we wanted to couple this to the core molecular data structure and we thought someone would do this for more than one electronic structure package (not sure if someone will actually do this), then it would make sense to generalize MolecularData to accept a backend parameter, which can use files, databases, or electronic structure packages like PySCF on the fly to pull the data from. I've been thinking we should overhaul MolecularData for a while now, as we have some changing needs(diagonal Hamiltonians, better managing of file trees etc), and it might be possible to include that change at that point. However, for now I think this way of styling the interface (with name change) is probably good.

babbush commented 6 years ago

Two other things. Please feel free to add yourself to the NOTICE and README as an author, (in the alphabetical section). I can update the other OpenFermion repos to reflect that. Also, please manually run some of the "generate_data" type files, and also the demo, to make sure none of these changes break those. Unfortunately, we do not have automatic testing setup just yet because of the difficulty of getting pyscf to install on the Travis server.

sunqm commented 6 years ago

Thans for the code review. I saw the documents in InteractionRDM that the two_body_tensor is defined as <a^\dagger_p a^\dagger_q a_r a_s>. Should it be contracted with integrals numpy.einsum('pqrs,pqsr', two_electron_integrals, two_rdm) or numpy.einsum('pqrs,pqrs', two_electron_integrals, two_rdm)

babbush commented 6 years ago

Hi Qiming,

I am pretty sure it is the second one. @ncrubin or @kevinsung might have thoughts if I'm wrong. By the way, why did you close the PR? You can leave it open while you work on it. I will reopen if that is okay.

sunqm commented 6 years ago

Different programs use different conventions on 2-pdm. It seems good to clarify the convention of 2-pdm in InteractionRDM document. If OpenFermion uses the convention 2pdm[p,q,s,r] = < p^\dagger q^\dagger r s>, the energy is then computed as numpy.einsum('pqrs,pqrs', tei, 2pdm). Another convention 2pdm[p,q,r,s] = < p^\dagger q^\dagger r s > is also acceptable. Then energy is ~ numpy.einsum('pqrs,pqsr', tei, 2pdm). It would be helpful to have this clarified in InteractionRDM.

Closing the PR is just a misclick on the "close and comment".

sunqm commented 6 years ago

I forgot to check the PQRS convention in OpenFermion. 2pdm[p,q,r,s] is <p^\dagger q^\dagger r s> and energy is einsum('pqrs,pqrs', PQRS, 2pdm).

babbush commented 6 years ago

Yes, that is true.

babbush commented 6 years ago

@sunqm I feel strongly that the travis config should be a separate PR. Thank you for working on that though!

sunqm commented 6 years ago

I have updated the code. Changes include

babbush commented 6 years ago

@sunqm if you are ready, I will merge.

sunqm commented 6 years ago

It's ready to merge. I will update travis in the next PR.