If the similarity score is the inner product between explicit feature embeddings $\phi(x)$, you can calculate the Vendi Score of the non-centered covariance matrix, $\frac{1}{n}{\sum}^{n}_{i=1} \phi(x_i)\phi(x_i)^{\top}$. This matrix can be calculated in a batch manner.
You could report the average Vendi Score of smaller subsets of the dataset.
Nice work!
Is there a way (possibly an approximation) to calculate Vendi score in this library in batch manner e.g. if dataset is too large to load into memory?