nota-github / AIC2024_Track1_Nota

11 stars 0 forks source link

Normalization bug #4

Closed maligawork closed 4 months ago

maligawork commented 5 months ago

Thank you for the interesting work! I am testing your project on my data and thinking that you will have NaN results each time when euc_dists and emb_dists have only one element. I think you should only use a embeding vector in such case. Would like to hear your thoughts.

clustering.py-file:

norm_emb_dists = (emb_dists - np.min(emb_dists)) / (np.max(emb_dists) - np.min(emb_dists))
norm_euc_dists = (euc_dists - np.min(euc_dists)) / (np.max(euc_dists) - np.min(euc_dists))
qkqk772 commented 4 months ago

Thank you for your feedback!

We did not notice this issue during our experiments. As you mentioned, it would be better to use only the emb vector in such cases. If you have any further suggestions or encounter other issues, please let us know :)