umami-hep / puma

puma - Plotting UMami Api
Apache License 2.0
4 stars 27 forks source link

Fixed fake rate normalization #269

Closed LucioDerin closed 2 months ago

LucioDerin commented 2 months ago

Summary

This pull request introduces the following changes

Conformity

samvanstroud commented 2 months ago

I'm struggling a bit here mainly because I don't know what scp.sparse.coo_matrix is doing.

I think I probably should have asked for the efficiency and purity rather than efficiency and fake rate here though. Is there any chance we can use the standard functions from sklearn here instead?

I think this is probably a good idea in case people want to easily calculate other metrics.

dkobylianskii commented 2 months ago

I'm struggling a bit here mainly because I don't know what scp.sparse.coo_matrix is doing.

Our confusion_matrix.py file is just a simplification of the sklearn.metrics.confusion_matrix function. We did this to reduce the number of dependencies because otherwise, we need the whole library for just one function.

What do you think about it?

samvanstroud commented 2 months ago

@dkobylianskii true, although I can imagine we might find other useful functions in sklearn, and there is not really a massive cost to adding it as a dep.

But since we have things working in our implementation I'm happy to keep things the way they are! @LucioDerin if you can just make sure we are computing the purity of the classification that would be perfect.

Thanks both!

LucioDerin commented 2 months ago

But since we have things working in our implementation I'm happy to keep things the way they are! @LucioDerin if you can just make sure we are computing the purity of the classification that would be perfect.

@samvanstroud I am happy to replicate sklearn's functions to compute the precision and recall scores, so that we have scores with clear definitions. Otherwise I can implement a function to compute the purity. Let me know what you think is best :)

samvanstroud commented 2 months ago

Precision and purity are the same, I'm happy to go with whichever approach you think is simplest, thanks @LucioDerin

LucioDerin commented 2 months ago

Precision and purity are the same, I'm happy to go with whichever approach you think is simplest, thanks @LucioDerin

I will try to implement those metrics in puma :) Maybe it's better to close this PR: I'll open another one as soon as the code is ready. Let me know what you think about that! @samvanstroud

samvanstroud commented 2 months ago

sounds good thanks @LucioDerin !