smarsland / pots

1 stars 1 forks source link

Machine learning results #14

Open Armand1 opened 4 years ago

Armand1 commented 4 years ago

Han writes:

I've got the result of our CNN filter algorithm. The testing set size is 3304 images and the comprehensive accuracy is 78.63%. The function of recognizing "good" ones accuracy is higher: 96.15%. That means this CNN filter seldom regard "good" pots into "bad" ones. I've checked those images which "confuse" our model and mose of them are wrong-angle ones.

That's great. I've looked at these myself. So, the first thing is that we have duplicate images in there: apparently some of the same images are in both the "bad" and "good" folders. How this happened I don't know --- my fault --- but that might degrade our performance a bit. Exclude these and we have 3286 images in the test set. This is what the result looks like (percentages of rows):

treatment CNN bad CNN good
ground truth bad 42% 58%
ground truth good 4% 96%

So, just as Han says, the CNN is very good at identifying good images (96% correct). But it's not so good at identifying bad images: only 42% of those are correct! It's doing worse than random!

I sorted the images into 4 folders with names like this: "original classification - CNN classification" So:

good-good good-bad bad-good bad-bad

in order to see what is going on. They are in the dropbox (I'll post a link when I Dropbox gives me one) . Let's focus on the misclassified ones.

There are 86 good images which the CNN thinks are bad. Why? Well, quite a lot of them are bad. They are broken, or at funny angles, or are closeups. So, that suggests we need to refine our training set a bit.

There are 612 bad images that the CNN thinks are good. Some of them were, indeed, good. (So, once again, we need to improve our training set.) But most really are bad. They're also a quite heterogenous bunch. My sense is that most of the mistakes are pots that are photographed from the bottom or top. They present as clear spherical objects against a clean background. So it's not recognizing them as being photographed from a disastrously bad angle.

Summarizing: assuming we vet the training set a bit better, and re-run the CNN, we can trust that images classified as bad really are bad. Removing them will get rid of about 16% of our images (533/3286). But, unless matters drastically improve, we cannot trust the images that it classifies as good. About 25% of them will, in fact, be bad (It classifies 2753 images as good, of which ~600 are in fact bad ~ 25%).

That's not terrible by any means. Remember, even once we have the CNN filter out the "good" images, we still have to vet them further manually to pull out the best image among the duplicate images for each vase. And that's a lot of work which the CNN cannot do.

So, my suggestion is this (Han):

(1) let's check the "bad beazley images for Han" and "good beazley images for Han" again to make sure that images are correctly classified. Also, let's make sure that there are no duplicates in them (no images that are in both "good beazley" and "bad beazley")

(2) Re-train the CNN, re-test it, and get a new set of results. This may be a bit better than the current results, but won't change them much I think.

(3) Run the trained CNN on ALL the images in the "unsorted and unclassified" --- that's 113,417 images. Excluding 16% of them that are classified as bad leaves us with 95,270 "good" images to look at.

Hmmm: the CNN will certainly be some help, but not as much as I had hoped! That is, in part, the CNN's failure to accurately identify bad images, but mostly it's because most images are, in fact, good. Remember: even if the CNN worked perfectly it would only classify about 30% of our images as bad, leaving us still with 79,391 "good" images to look at.

Armand1 commented 4 years ago

Should we be thinking about classifying the bad images in to finer categories e.g., "drawings", "close-ups", "bad angles" etc? Would that help? So "good" would be just one category among them.

Armand1 commented 4 years ago

I am resorting the images that Han used for training the CNN. This is to check them and to refine classes.

There are 5 bad classes: detail an image showing only part of a vase in close-up angle an image showing an entire vase, but from a severely foreshortened angle, typically the bottom. multiples an image showing multiple objects. Could be fragments or several good vases. drawing a drawing broken a single, broken, vase (I expect this class to perform poorly since vases are broken in many ways)

And 1 good class: good a single, intact, whole vase, shown more or less right-on

Perhaps, then, we might have a multiple class CNN and find that it works better. But even if we pooled all the bad classes, the CNN should still work better since this has now been checked (and will be checked again by Han.)

LittleAri commented 4 years ago

96% rate of classifying good pots is great and 42% for the bad is definitely a good start!

Out of interest, could I see some examples of truly bad "pots" that the CNN missed? Not sure which dropbox folder these are all in.

Armand1 commented 4 years ago

@LittleAri I would --- except that I have had to comprehensively redo my dropbox, and it's taking an AGE to load up again. In any case, Han is going to re-run the CNN with the new 6-part classification of vases, and we'll see how that does.

Armand1 commented 4 years ago

Applying Han's best ML algorithm to ALL the unsorted vases, 108,953, of them gave 39,582 bad ones and 69,371 good ones.

But the results of Han's test set suggested that there would be many misclassified. So both classes had to be checked manually. This I did. And the result is:

CNN/AML bad good  
bad 36,843 2,739 39,582
good 16,320 53,051 69,371
  89,894 35,379 108,953

Or in percentages:

CNN/ME bad good
bad 93% 7%
good 24% 76%

That is to say, the CNN identifies bad images correctly 93% of the time, and identifies good images correctly 76% of the time. This is pretty good. Even so, the big job was identifying the 16k images that it labelled good, but were in fact bad. But now, I think, we have got rid of all the bad images. Now we have to find the best image for each of the remaining fabrics, notably Athens.