This PR is an extension to #47 which increases the size of the MB1 benchmark set.
Refactoring Changes
The separate scripts to filter the data set have been condensed into a single one and are now applied to both pure and mixture data.
The separate scripts to partition the data set based on whether substances appear or don't appear in the training set have been condensed into a single script.
Data Set Selection Changes
After making the changes described below, the test set now includes
Filtering
The filtering criteria is now applied to both pure and mixture sets, rather than just the mixture sets.
We now filter out any measurements made for substances which contain components with undefined stereochemistry.
Mixture Data Selection
In addition to including only measurements made for systems which neither component was in the training set, we now also include alcohol-alcohol and ester-ester mixtures where both components do appear in the training set.
Given that we didn't directly fit against alcohol-alcohol and ester-ester mixtures, this should give a further test of how well the parameters trained only on alcohol-ester mixtures performs.
Binary Enthalpy of Mixing
R-OH - R-OH
R-OH - R(=O)OR
R(=O)OR - R(=O)OR
Both in train
*
X
✓
One in train
✓
X
X
None in train
✓
✓
✓
* No data was available for alcohol-alcohol mixtures where both components appeared in the training set. Instead we use data where only one component appeared.
Binary Excess Molar Volume
R-OH - R-OH
R-OH - R(=O)OR
R(=O)OR - R(=O)OR
Both in train
✓
X
✓
One in train
X
X
X
None in train
✓
✓
✓
Binary Mass Density
R-OH - R-OH
R-OH - R(=O)OR
R(=O)OR - R(=O)OR
Both in train
✓
X
✓
One in train
X
X
X
None in train
✓
✓
✓
Pure Data Selection
We now choose the pure density measurements by filtering out any data points measured for components which do not appear in the selected binary density, enthalpy of mixing, excess molar volume and enthalpy of vaporization sets.
Description
This PR is an extension to #47 which increases the size of the MB1 benchmark set.
Refactoring Changes
The separate scripts to filter the data set have been condensed into a single one and are now applied to both pure and mixture data.
The separate scripts to partition the data set based on whether substances appear or don't appear in the training set have been condensed into a single script.
Data Set Selection Changes
After making the changes described below, the test set now includes
Filtering
The filtering criteria is now applied to both pure and mixture sets, rather than just the mixture sets.
We now filter out any measurements made for substances which contain components with undefined stereochemistry.
Mixture Data Selection
In addition to including only measurements made for systems which neither component was in the training set, we now also include alcohol-alcohol and ester-ester mixtures where both components do appear in the training set.
Given that we didn't directly fit against alcohol-alcohol and ester-ester mixtures, this should give a further test of how well the parameters trained only on alcohol-ester mixtures performs.
Binary Enthalpy of Mixing
* No data was available for alcohol-alcohol mixtures where both components appeared in the training set. Instead we use data where only one component appeared.
Binary Excess Molar Volume
Binary Mass Density
Pure Data Selection
Status