openforcefield / open-forcefield-tools

Tools for open forcefield development
MIT License
8 stars 6 forks source link

Revision of API document after discussion at UC Irvine, 7/11/2016 #15

Closed mrshirts closed 8 years ago

mrshirts commented 8 years ago

API changes (mostly minor):

Minor documentation changes:

jchodera commented 8 years ago

Mind filling me in here?

jchodera commented 8 years ago

Merged the concepts of MeasuredPhysicalProperty and ComputedPhysicalProperty into a single PhysicalProperty

We can't do this. They will have different properties. There is no simulation object attached to a MeasuredPhysicalProperty, and no doi attached to a ComputedPhysicalProperty, for example. They can be subclasses of a single PhysicalProperty, but they can't be the same thing.

More generally, why are you changing the API when I am in the middle of implementing it? I gave you guys plenty of time to comment on this during the design phase!

jchodera commented 8 years ago

(minor) Made Composition a member of ThermodynamicState

Also, no. Not a bit.

There's no way this will work. We can easily reweight measurements for the same Composition simulated at different choices of ThermodynamicState, but I'm not going to be able to figure out which ThermodynamicState objects I can reweight from if the Composition is part of it! We need the separation between the two.

davidlmobley commented 8 years ago

@jchodera

We can't do this. They will have different properties. There is no simulation object attached to a MeasuredPhysicalProperty, and no doi attached to a ComputedPhysicalProperty, for example. They can be subclasses of a single PhysicalProperty, but they can't be the same thing.

We're also fine with subclasses. Apparently his visit here yesterday/today was the first time Michael had looked at this in detail, and he had lots of things which seemed weird to him. I was able to explain a lot of the logic to him and get him more comfortable with it, but there were still some issues he wanted to raise. Here he wanted to do something to recognize that these were closely related data structures. Also, it's worth noting that the measurement method could be a simulation (in the case of fitting to isolated molecule simulations!).

(minor) Made Composition a member of ThermodynamicState Also, no. Not a bit.

There's no way this will work. We can easily reweight measurements for the same Composition simulated at different choices of ThermodynamicState, but I'm not going to be able to figure out which ThermodynamicState objects I can reweight from if the Composition is part of it! We need the separation between the two.

Michael's attachment to this was from a "conceptually, what does thermodynamic state mean?" perspective. I think your argument about reweighting, though, is compelling. Michael would probably ask, though, why you can't just make sure the composition is the same when it is part of the ThermodynamicState object in the same way you would if it's separate?

I don't think he's super attached to this idea, though, since as I mentioned it was from a conceptual point of view about what thermodynamic state means physically (temperature, pressure, number of particles, composition, etc.)

(I'd let him answer for himself but he's on a plane.)

mrshirts commented 8 years ago

Hi, John-

This was a result of some discussion (that I believe Chris and Mobley were in agreement on about how to align the data structures more closely with the concepts in ways that we thought made sense. I think there was agreement on all the topics we covered (David signed off on the suggested changes), though I think he was less invested in them than me :)

Here he wanted to do something to recognize that these were closely related data structures.

I think it aligns well with 1) how we will use them in practice and 2) how physical properties are already represented in ThermoML and 3) the fact that they really can be seen as variations on the same thing.

Also, it's worth noting that the measurement method could be a simulation (in the case of fitting to isolated molecule simulations!).

This also parallels ThermoML, where it allows for a physical property to be the result of a calculation.

There is no simulation object attached to a MeasuredPhysicalProperty,

The idea is that Simulation object can be type of MeasurementMethod. (which likely would need to be more than just a string anyway, since even for experiments you want to tag the experiment with as much information as possible about how it was produced - 'vibrating tube method' may not be sufficient, you may want to describe which instrument, etc.

and no doi attached to a ComputedPhysicalProperty, for example. We are going to be using for the single molecule test computations "experimental" physical properties, which will not have DOI's. If we run data fits on data generated in the lab, we will be using them before DOI. So fundamentally, DOI can't be required to be defined. We could be doing model experiments where we try to fit one model with another model. So I would claim that it's conceptually simpler to see treat them as the shape type of object.

From a CS level, it might make more sense to make them derived classes from a single parent class. This VERY well could be the case. I didn't think so, but I could easily be wrong.

mrshirts commented 8 years ago

More generally, why are you changing the API when I am in the middle of implementing it? I gave you guys plenty of time to comment on this during the design phase!

Very valid concern. 1) it wasn't clear that there progress that it would make a difference from what had been committed so far 2) if there were changes, making them sooner is better than waiting until later 3) design documents can't be expected to be totally frozen, because better understanding of the problem will affect the design.

And 4) a lot of the final discussion took place when I was on vacation with family, so I didn't get a chance to really look over it until I got to UCI on Monday.

But it's still a valid concern.

mrshirts commented 8 years ago

Michael's attachment to this was from a "conceptually, what does thermodynamic state mean?" perspective. I think your argument about reweighting, though, is compelling. Michael would probably ask, though, why you can't just make sure the composition is the same when it is part of the ThermodynamicState object in the same way you would if it's separate?

Pretty much. Conceptually, an "thermodynamic state" has a well defined preexisting meaning that is the specification of the variables needed to have unique macroscopic observable measurements; T (or E), P (or V), and N_i (or mu_i), and the phase.

One could create a "ThermodynamicState" class that has "ThermodynamicVariables" member (with members T and P), and then a separate Composition class.

Also in theory, one actually can reweight to different compositions (map from N-1 to N molecules) -- but it's almost certainly a very bad idea from a computational efficiency point of view.

jchodera commented 8 years ago

Check out the 'propcalc' branch.

More importantly, I don't know how to implement the modified API because it changes around all the data structures in a confusing way. This would take some time to figure out.

We can make computed and measured physical properties subclasses of the same base class if you can tell me EXACTLY which object properties are shared between them.

A simulation-derived estimate cannot simply have a different measurement method (instead of a set of Simulation objects) because it may be derived from reweighing multiple simulations, and some simulations may contribute to multiple estimated properties. How would the many-to-many relationship carry over to experimental measurements? It doesn't make sense to introduce that complication if it isn't needed. We do need to add some form of this for the properties David wants to use in fitting, but it can be much, much simpler---just compute the properties and copy over the values and uncertainties to a MeasuredPhysicalProperty with a very simple kind of measurement method object.

We've already gone through and closed the API design process when I started the implementation last weekend. I should have been more clear about that.

We can either start from scratch now---which also means figuring out how to implement the damn thing, which is the hard part---and maybe have something by the end of summer (???), or we can have you guys submit issues commenting on things the API does not yet support or an example of something that is hard in the current API, and we can modify it as needed. But if we want something in the near term, I highly suggest the latter.

I don't mean to criticize the thought that went into this. It's just not the right time for this kind of thing.

davidlmobley commented 8 years ago

@jchodera - totally agree that if implementation is already underway then the best thing for us to do is work on what you've already done, help you implement it, etc. I will absolutely get to looking at that branch right away!

@mrshirts - be sure to also see discussion of this on Slack. A key question at this point is whether there is anything functionally that you couldn't do with the API as John has designed it. My answer on Slack was "no, everything is fine!" and I still think that's the case (i.e. all of our discussion on Monday focused on what you thought was conceptually clearer, etc. - none of it was about "OK, well, we can't do this if we don't have this...").

So, @jchodera , my answer is still that there is absolutely nothing we should scrap about what you're doing. These were 100% issues about "conceptual clarity" and nothing is functionally important. I totally agree with you that if you are well on your way towards implementing it we absolutely should NOT scrap it and set the project way back.

davidlmobley commented 8 years ago

So, @mrshirts , a key question for you would be this:

We can make computed and measured physical properties subclasses of the same base class if you can tell me EXACTLY which object properties are shared between them.

Can you look at what he's done so far and get back to us on that?

mrshirts commented 8 years ago

I'll modify this after the discussion today, and properly resubmit the pull request for single molecule property code when the issues are updated.

davidlmobley commented 8 years ago

@mrshirts - I believe this can be closed as you indicated you're extracting what you wanted to have merged from here and bringing it in another way?

(Reopen if not.)