openforcefield / nistdataselection

Records the tools and decisions used to select NIST data for curation.
MIT License
3 stars 0 forks source link

[Mixture] Update Phase 1 Benchmark Set Curation #51

Closed SimonBoothroyd closed 4 years ago

SimonBoothroyd commented 4 years ago

Description

This PR is an extension to #47 which increases the size of the MB1 benchmark set.

Refactoring Changes

Data Set Selection Changes

After making the changes described below, the test set now includes

Filtering

Mixture Data Selection

In addition to including only measurements made for systems which neither component was in the training set, we now also include alcohol-alcohol and ester-ester mixtures where both components do appear in the training set.

Given that we didn't directly fit against alcohol-alcohol and ester-ester mixtures, this should give a further test of how well the parameters trained only on alcohol-ester mixtures performs.

Binary Enthalpy of Mixing

R-OH - R-OH R-OH - R(=O)OR R(=O)OR - R(=O)OR
Both in train * X
One in train X X
None in train

* No data was available for alcohol-alcohol mixtures where both components appeared in the training set. Instead we use data where only one component appeared.

Binary Excess Molar Volume

R-OH - R-OH R-OH - R(=O)OR R(=O)OR - R(=O)OR
Both in train X
One in train X X X
None in train

Binary Mass Density

R-OH - R-OH R-OH - R(=O)OR R(=O)OR - R(=O)OR
Both in train X
One in train X X X
None in train

Pure Data Selection

Status