vertaix / Vendi-Score

MIT License
93 stars 8 forks source link

Batch Calculation Possibility? #2

Closed ogencoglu closed 1 year ago

ogencoglu commented 1 year ago

Nice work!

Is there a way (possibly an approximation) to calculate Vendi score in this library in batch manner e.g. if dataset is too large to load into memory?

danfriedman0 commented 1 year ago

Thanks!

  1. If the similarity score is the inner product between explicit feature embeddings $\phi(x)$, you can calculate the Vendi Score of the non-centered covariance matrix, $\frac{1}{n}{\sum}^{n}_{i=1} \phi(x_i)\phi(x_i)^{\top}$. This matrix can be calculated in a batch manner.
  2. You could report the average Vendi Score of smaller subsets of the dataset.