Open scott-materials opened 1 year ago
Simple enough! We can update the default featurizer in this file: https://github.com/jacksund/simmate/blob/main/src/simmate/toolkit/validators/fingerprint/pcrystalnn.py
You can see that is the default class for the evo search here: https://github.com/jacksund/simmate/blob/main/src/simmate/apps/evolution/workflows/fixed_composition.py#L53-L57
Describe the desired feature
CrystalNN is our go-to for fingerprinting in the evolutionary algorithm and reverse monte carlo. If we look at the current code for this validator, it generates a vector of length 244. This is because the
cnnf.from_preset("ops", ... )
uses all 61 structural descriptors from CrystalNN, and we're calculating four different statistics for each:
stat_options: list[str] = ["mean", "std_dev", "minimum", "maximum"]
I want to have an alternate version of this function, "CrystalNN_fast" that will accelerate database queries and be compatible with the default versions of Postgres Cube and Digital Ocean. I believe we can use a version of the CrystalNN fingerprint that uses a vector of length 49. Here's how.
1) Only calculate the mean.
Because we will only query those structures that have identical stoichiometries, this will immediately reduce the scope of the search space. The probability of duplicates should still be extremely small even without the std_dev, min, and max.
2) Only calculate linearly independent vectors.
By default, the preset
ops
generates duplicate information. For example, for coordination number 8,ops
calculates these:8: ['body-centered cubic', 'hexagonal bipyramidal'],
but it also calculates a value called
wt_8
that is simply the sum of the first two vectors. Sincewt_8
has no additional descriptive information, we should exclude it . This unnecessary information occurs for all coordination numbers between 1 and 12. Therefore, by excluding these 12, we can drop the 61 vectors down to 49. Hooray!Here's how we do it:
@jacksund what do you think? FYI I gave Gabe one million of these 49-length vectors for testing.
Additional context
No response
To-do items
No response