Closed SimonBoothroyd closed 4 years ago
For alcohol-alcohol mixtures, there
is only data for binary mass density data for which both components in the system were in the training set.
are 27 binary mass density, 11 enthalpy of mixing and 3 excess molar volume data points where one of the components are in the training set
are 13 binary mass density, 4 enthalpy of mixing and 1 excess molar volume data points where neither of the components are in the training set
Description
This PR implements the scripts which will be used to construct an initial benchmark set. This benchmark set is expected to be modest in size (~50 pure data points, ~120 mixture data points) so as to be able to rapidly assess the performance of optimisations, but will be complemented by further sets outlined in future PR's.
Plan for the Data Set
We will be less systematic in the selection of those systems to include in the benchmark set, opting instead to aim to curate a set which has a diverse set of molecules with pure density, enthalpy of vaporization data points, and binary enthalpy of mixing, of excess molar volume, and binary mass density data points, without enforcing that substances must have all such be available to be included (as was the case for the training sets).
In order to test how well each of the different produced force fields generalise, we initially aim to include binary mixtures of alcohols and alcohols, alcohols and esters (/ acids), and esters (/acids) and esters (/acids).
In an attempt to ensure that we are testing the performance of the refit parameters, rather than the full Parsley 1.0.0 force field, we will exclude any
again, this will likely be relaxed in future benchmark sets.
This set will only contain mixtures whereby neither of the components appear in the training set. Future data sets may then be complement with mixtures which do partially contain training data to further explore interesting results highlighted by this initial set.
Status