Open rbjiawen opened 1 week ago
The add_uncertain_environments
method of the SparseGP class begins as follows:
void SparseGP ::add_uncertain_environments(const Structure &structure,
const std::vector<int> &n_added) {
initialize_sparse_descriptors(structure);
// Compute cluster uncertainties.
std::vector<std::vector<int>> sorted_indices =
sort_clusters_by_uncertainty(structure);
std::vector<std::vector<int>> n_sorted_indices;
for (int i = 0; i < n_kernels; i++) {
// Take the first N indices.
int n_curr = n_added[i];
if (n_curr > sorted_indices[i].size())
n_curr = sorted_indices[i].size();
std::vector<int> n_indices(n_curr);
for (int j = 0; j < n_curr; j++) {
n_indices[j] = sorted_indices[i][j];
}
n_sorted_indices.push_back(n_indices);
}
Notice that n_added[i]
is out of bounds for i = 2 in your example, so the behavior of this function is undefined. It's possible you're getting a very large positive integer for n_added[2]
, in which case all environments are getting added for the third kernel, possibly explaining the improvement in MAE. Hard to say for sure.
I'm going to add an assertion that throws an error when there is a mismatch between the number of kernels and the size of n_added
, since we really shouldn't be allowing them to be different.
Hi, in my recent test:
When the kernel list and descriptors list remain three-dimensional, the mean absolute error on the training set will drop significantly when using a two-dimensional atomic environment instead of three. In addition, there is no error in running this way. I'm confused about this, is this the right way to start?