theochem / gbasis

Python library for analytical evaluation and integration of Gaussian-type basis functions and related quantities.
http://gbasis.qcdevs.org/
GNU Lesser General Public License v3.0
39 stars 22 forks source link

Change API for storing OBasisInfo #2

Closed FarnazH closed 5 years ago

FarnazH commented 5 years ago

Check: https://github.com/theochem/iodata/issues/42

kimt33 commented 5 years ago

According to @PaulWAyers 's notes and the talking with him, it seems that the fundamental objects that we will be utilizing are the set of contracted Cartesian Gaussian functions of the same angular momentum. These functions will be directly used to construct the contracted spherical Gaussian functions, linear combinations of contractions, products of contractions, derivatives of contractions, and integral of the product of contractions.

Then, it seems intuitive to construct a class for the contracted Cartesian Gaussian functions of the same angular momentums. It will have the following attributes/properties:

Arguments for:

  1. Enforce consistent data structure
  2. Ensure consistent contents of the data structure
  3. Easy to test
  4. Easy starting project
  5. Simplest object (I think) without killing the performance on the recursion part

Arguments against:

  1. Naive implementations using python objects will be slow. It will be difficult to "vectorize"
  2. ANO type basis set will not have the same performance

Rebuttal against:

  1. To me, maintainability and stability of API are much more important than whatever speed we can get out of Python. I don't think it'll be that hard to "unpack" these objects into something more efficient. However, I think that the first version of the code should be the most explicit and easy to understand. Let's not optimize prematurely.
  2. We can support shells with different angular momentums by adding another structure on top. I don't think that designing a generalized structure just to support efficient implementations of ANO orbitals is worth the trouble. The option is there to do it in the future (it shouldn't be that hard if the code is well documented), just not right now.
tovrstra commented 5 years ago

Solid plan. This would indeed not yet support generalized contractions, but your arguments to postpone such additional complexity are completely valid.

It is also good that you already anticipate for generalized contractions by using angmoms instead of angmom. You could even go further, and make it a 1-element array, and raise an exception (for the time being) when it does not contain one element. This way, you won't have API changes when someone would take on the job of coding generalized contractions.

A few small questions/comments:

PaulWAyers commented 5 years ago

I think David was thinking of computing integrals by using the data from the objects (not the objects themselves) as arguments. I think that is more performant, and I like the fact that such a function would be more easily used by someone else (as they can just pass arguments, rather than having to first construct our custom-made object....

tovrstra commented 5 years ago

That would also seem the most useful to me.

kimt33 commented 5 years ago

It is also good that you already anticipate for generalized contractions by using angmoms instead of angmom. You could even go further, and make it a 1-element array, and raise an exception (for the time being) when it does not contain one element. This way, you won't have API changes when someone would take on the job of coding generalized contractions.

  • I'm a bit confused by the wording 'atom type' for icenter. Can you elaborate?

I should have clarified that the names of the attributes/properties were taken from those suggested in https://github.com/theochem/iodata/issues/42. I made some loose associations to link the idea to whatever was presented. They will most likely be different. I will make a pull request with the detailed API soon. I should mention that the angular momentum will be stored as a single number and not as an array. I would like the data structure to be as minimal as possible, and it only seems like it will be a minor inconvenience for the generalized contractions implementation.

  • Can you add type information and shapes of arrays for the attributes?

We will follow the standard NumPy format, which will include, at the very least, the type and shape.

  • How would you encode Cartesian versus pure functions? (with a negative/positive angomom value?)

I think that pure functions should be treated as linear combinations of the Cartesian types i.e. they do not have their own data structure or are built into the cartesian function class. It an inconvenience to retrieve the angular momentum values (which is an issue for the IOData), but if there is a wrapper, it shouldn't be too hard to extract out the magnetic moments, especially since each class is associated with one angular momentum.

  • When just computing integrals for a given combination of shells, iatom, icenter and centers can be replaced by just center (singular). My impression is that this class is also intended as an item in a list that represents the molecular basis set. However, the latter purpose is more complicated and mixes different concerns. In the former case, one would need a different data structure to describe the molecular basis set, from which these shell instances can be trivially derived when needed. Then the majority of your integral code is not affected by how molecular basis sets are represented, which would be a major advantage.

This class is intended to be independent of the molecular basis set. I believe that the instances of this class can be processed to construct a molecular basis set, but the goal is to create a data structure to ease implementation.

  • Does it have to be a class? Which methods do you expect? Computing integrals is probably done through functions taking these shell instances as arguments, or not?

It does not have to be a class. Data class (which shouldn't have any methods, I think) was chosen because we can check the input upon instantiation of the class rather than inside the function. I believe that the inputs to the function should be checked rather than hoping that the users follow the documentation. It is also easier to document (class documentation is standardized).

I think David was thinking of computing integrals by using the data from the objects (not the objects themselves) as arguments. I think that is more performant, and I like the fact that such a function would be more easily used by someone else (as they can just pass arguments, rather than having to first construct our custom-made object....

I think that at the beginning, passing the objects as the arguments will make it easier to implement. In the case that we decide to change the types of attributes allowed, we won't need to refactor the docstrings. We can think of the data class as a way of standardizing the low level objects needed for computing gaussian related things. As for performance, there's not much of a difference between passing an object and unpacking it inside, and unpacking the object and passing its contents. Now, if we unpacking many objects, then organizing it to facilitate vectorization, that would make a difference. But that can be done afterwards, I think. I believe that the first version of the code should be painfully easy to understand (while avoiding structures that may inhibit improvements in the future).

tovrstra commented 5 years ago

OK, good. I'll postpone further discussions until the PR then. It is easier to go into details at that stage.

kimt33 commented 5 years ago

See https://github.com/theochem/gbasis/blob/master/gbasis/contractions.py for the latest data class

kimt33 commented 5 years ago

We can reopen this, but at the moment, the API for the data class seems pretty stable (at least for the use cases defined). I think it would be better to start a new issue if there's a problem.