What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment?

mir-group / flare

An open-source Python package for creating fast and accurate interatomic potentials.

MIT License

292 stars 71 forks source link

B1_descriptor = B1(radial_basis, cutoff_name, radial_hyps, cutoff_hyps, [n_species, 10]) B2_descriptor = B2(radial_basis, cutoff_name, radial_hyps, cutoff_hyps, [n_species, 8, 3]) B3_descriptor = B2(radial_basis,cutoff_name, radial_hyps, cutoff_hyps, [n_species, 8, 3]) descriptors = [B1_descriptor,B2_descriptor,B3_descriptor] kernels = [dot_product_kernel_B1, dot_product_kernel_B2, dot_product_kernel_B3] sparse_gp.sparse_gp.add_uncertain_environments(struc_pp, [sparse_size,sparse_size]) #sparse_gp.sparse_gp.add_uncertain_environments(struc_pp, [sparse_size,sparse_size,sparse_size])

The add_uncertain_environments method of the SparseGP class begins as follows:

void SparseGP ::add_uncertain_environments(const Structure &structure,
                                           const std::vector<int> &n_added) {

  initialize_sparse_descriptors(structure);
  // Compute cluster uncertainties.
  std::vector<std::vector<int>> sorted_indices =
      sort_clusters_by_uncertainty(structure);

  std::vector<std::vector<int>> n_sorted_indices;
  for (int i = 0; i < n_kernels; i++) {
    // Take the first N indices.
    int n_curr = n_added[i];
    if (n_curr > sorted_indices[i].size())
      n_curr = sorted_indices[i].size();
    std::vector<int> n_indices(n_curr);
    for (int j = 0; j < n_curr; j++) {
      n_indices[j] = sorted_indices[i][j];
    }
    n_sorted_indices.push_back(n_indices);
  }

Notice that n_added[i] is out of bounds for i = 2 in your example, so the behavior of this function is undefined. It's possible you're getting a very large positive integer for n_added[2], in which case all environments are getting added for the third kernel, possibly explaining the improvement in MAE. Hard to say for sure.

I'm going to add an assertion that throws an error when there is a mismatch between the number of kernels and the size of n_added, since we really shouldn't be allowing them to be different.

mir-group / flare

What happens when using three kernels and corresponding three descriptors, but only adding a two-dimensional sparse environment? #425