Open raspstephan opened 5 years ago
I added some thoughts on the calculation of IoU and also introduces my ideas how users could be ranked by IoU to split the group into 'good' and 'bad' to increase and check the quality of the labels.
To the more general question, like the one you posted (how often did people find the same class in one image) I just want to add here a few more:
In the user-agreement.ipynb
I started testing an "easily understandable" score. It basically says: "In how many cases, if one user labeled one class in an image, did another user also label that same class in that same image."
I get an answer of around 50% which seems reasonable.
Future extensions:
In my latest commit 04b475e I extended the simple agreement score. It is now possible to check for agreement for each class separately. Here is a quick summary: Sugar 0.5696511428080426 Flower 0.5867732398992664 Fish 0.4670166865935437 Gravel 0.5150726420898206
Further I also included an option to make the agreement conditional on a minimum IoU score. Here you need to define a threshold. The magnitude of the results depend quite heavily on the threshold. The order of the classes does not however.
This should suffice as a start. We can then also use this score to compare a ML model against an average human!
This looks very promising and the basic figures are evolving! I will follow the question now "How likely is it, if someone has classified one pattern, that someone else has classified something different?" I just need to tweak my code about the transitions a bit.
Or is did you answer this question already? Or did you only check for the identities?
Please check the IoU notebook cloud-classification/analysis/IoU.ipynb
I tried to explain my thinking coming up with a multi-label, multi-class IoU version. The multi-class case is tricky and ambiguous. Please see what makes sense to you and what other metric would be more useful (see questions at the end of the nb).