openforcefield / nistdataselection

Records the tools and decisions used to select NIST data for curation.
MIT License
3 stars 0 forks source link

[Sage Discussion]: Mixture Data Selection #16

Open ocmadin opened 4 years ago

ocmadin commented 4 years ago

Mixture Data Selection: We currently have ThermoML as our source of mixture data. What mixtures should be emphasized in the training data? What solvents other than water that we should be focusing on?

ocmadin commented 4 years ago

Parsley Benchmark Set: https://github.com/openforcefield/release-1-benchmarking/blob/master/physical_properties/release_1_benchmark_set.pdf

davidlmobley commented 4 years ago

I'd say this overlaps with my comments on #14 . I'd personally say we want some "core" mixtures if possible (@leeping will likely have thoughts, but core molecules that people have historically parameterized FFs with) plus mixtures selected to give appropriate coverage of parameters.

Remember, we also would like to bring in water data such as that @leeping used in fitting TIP3P-FB, but not fit the water model yet (but having the data be part of the fitting/on hand will ensure we are ready to co-optimize the water model next iteration).

In other words we also want to start looking at mixtures with water as well!

I'd also want diverse mixtures of polar-nonpolar, polar-polar, etc solvents. Number of hydrogen bond donors and acceptors is important, ideally protic-aprotic mixtures etc. You want to get some where things pair well and some where they don't.