xu-ji / IIC

Invariant Information Clustering for Unsupervised Image Classification and Segmentation
MIT License
861 stars 207 forks source link

Extracting images with relative cluster labels, and segmentation #50

Closed pcicales closed 4 years ago

pcicales commented 4 years ago

Great work. This has a lot of applicability.

I am attempting to cluster (fully unsupervised) some binary data I have, but I want to cluster it into 10 groups (so setting gt_k and output_k_B to 10 without caring about the output accuracies).

I wanted to extract the assigned cluster label so that I could then review the images relative to their semantic group. I noticed you did something similar in your paper - is there some code in the repo that I didnt notice which can output these labels?

Also - is it possible to generate segmented images with your code without providing Ground Truths? I initially believed this was the case but noticed you report GT images - was this just to compare to human performance?

Again thanks for your time. I really look forward to experimenting with your code!

pcicales commented 4 years ago

Apologies for not looking more closely - I noticed in this issue that the masks were used to isolate the 'stuff' in COCO. So in my case where there are no masks, I should refer to the other implementation correct?

xu-ji commented 4 years ago

Is it whole image clustering or segmentation you are interested in?

Just running data through a network will directly output the cluster probabilities for each input image (or several, if you use sub-heads). For c clusters, this is c per image for whole image clustering and c*h*w per image for segmentation. Simply argmax across the cluster dimension c to get the assigned cluster (per image or pixel). Currently the testing code in the scripts uses cluster_eval which does this as part of the accuracy evaluation.

You can write your own function to visualise these predictions in any way you like, either called during training or offline on some saved model or predictions. The MNIST visualisation in the paper was done with the save_progression flag.

So yep, all the results for unsupervised IIC were rendered without ground truth, except to determine the cluster to ground truth class mapping (i.e. the colour for each cluster). Which does not matter if you have no labels.

If there are no masks, this is the same as using masks (tensors same size as input) but set to all 1s, which is what is done for Potsdam. You could also edit the codebase for your purposes to remove masks support entirely.

pcicales commented 4 years ago

I see. I was able to get the code working for the clustering task, and will now try to do the same thing for segmentation. Just so I understand correctly, the uncertainty estimation you describe in the paper is derived from the N point estimates generated by the N sub heads? Could you explain this mathematically a bit more?

pcicales commented 4 years ago

Also, here is how I modified the accuracy function for anyone who wants to cluster into more groups then there are labels. It is simply the percentage of the dominant class in each cluster, weighted by the cluster size relative to the entire data set. This can also be used in cases where the number of clusters is equal to the number of classes. The code is given below:

def _acc(preds, targets, num_k, verbose=0):
  assert (isinstance(preds, torch.Tensor) and
          isinstance(targets, torch.Tensor) and
          preds.is_cuda and targets.is_cuda)

  if verbose >= 2:
    print("calling acc...")

  assert (preds.shape == targets.shape)
  assert (preds.max() < num_k and targets.max() < num_k)

  cluster_res = []
  for i in list(torch.unique(preds)):
    x_unique = targets[np.where(preds == i)].cpu().data.unique(sorted=True)
    x_unique_count = torch.stack([(targets[np.where(preds == i)].cpu().data == x_u).sum()
                                  for x_u in x_unique])
    print('Sample counts for cluster ' + str(i.item()) + ':')
    holder = []
    for j in xrange(len(x_unique)):
      counter = x_unique_count[j].item()*1.0
      tot =  x_unique_count.sum().item()*1.0
      holder.append(counter)
      if (counter / tot) > 0.8:
        print('Target class ' + str(x_unique[j].item()) + ', ' + str(x_unique_count[j].item()) + ' total, ' +
              str(100 * (counter / tot)) + '% (!!!!!)')
      else:
        print('Target class ' + str(x_unique[j].item()) +', ' + str(x_unique_count[j].item()) + ' total (' +
            str(100*(counter / tot)) + '%)')
    cluster_res.append((max(holder)/(x_unique_count.sum().item()*1.0))*((x_unique_count.sum().item()*1.0)/len(preds)))

  acc = sum(cluster_res)

  return acc
xu-ji commented 4 years ago

the uncertainty estimation you describe in the paper is derived from the N point estimates generated by the N sub heads

The +- uncertainty in table 1 is just the standard deviation of accuracy of the sub-heads. To describe the range in performance.

cluster into more groups then there are labels

I think this is very similar to our "semi-supervised overclustering" setting (commands). The function we used to find the many-to-one mapping is here.