Unclear about the code of calculate_ppi function

It seems unclear what is the reason for creating 20 random values ns = np.linspace(0, n_max, 20).astype(int) to determine the number of labelled data used to calculate the PPI, but in the end only the PPI value calculated from n_max is retained as seen from avg_ci = ci.mean(axis=0)[-1]

I don't understand what is the purpose of conducting multiple trials? Since only the only the PPI value calculated from n_max is retained, it should be constant throughout the trials isn't it?

https://github.com/stanford-futuredata/ARES/blob/2684d477878e515c3dc31cf4b91fb848a84bdb90/ares/RAG_Automatic_Evaluation/LLMJudge_RAG_Compared_Scoring.py#L238-L281

Also, I saw there there were other functions from the original PPI repository (PPI bootstrap, cross PPI etc), what were you consideration when choosing which PPI function to use?

stanford-futuredata / ARES

Unclear about the code of calculate_ppi function #64