Batch Calculation Possibility?

vertaix / Vendi-Score

MIT License

93 stars 8 forks source link

Closed ogencoglu closed 1 year ago

ogencoglu commented 1 year ago

Nice work!

Is there a way (possibly an approximation) to calculate Vendi score in this library in batch manner e.g. if dataset is too large to load into memory?

danfriedman0 commented 1 year ago

Thanks!

If the similarity score is the inner product between explicit feature embeddings $\phi(x)$, you can calculate the Vendi Score of the non-centered covariance matrix, $\frac{1}{n}{\sum}^{n}_{i=1} \phi(x_i)\phi(x_i)^{\top}$. This matrix can be calculated in a batch manner.
You could report the average Vendi Score of smaller subsets of the dataset.