xu-ji / IIC

Invariant Information Clustering for Unsupervised Image Classification and Segmentation
MIT License
861 stars 207 forks source link

How to run this code on custom data #8

Closed shadyatscu closed 5 years ago

shadyatscu commented 5 years ago

Thanks for your great work! I check the code and find its hard coded for the benchmark given in the paper, whether could i run on my custom data? thanks a lot!

xu-ji commented 5 years ago

Hi, the code has optional settings that you may not want or need. Each setting is easy to turn on or off using the input arguments to the scripts.

To run on your own dataset, create your own script. For example, create a copy of cluster_sobel_twohead.py inside code/scripts/cluster and replace the call of cluster_twohead_create_dataloaders in line 171 with a call to your own function, which would return 4 dataloaders:

In summary, you would only need to change that one function call to cluster_twohead_create_dataloaders. (It should still obey the input settings in config for number of dataloaders, input size, transforms etc - which you can of course also change or remove if redundant.)

RyanCV commented 5 years ago

@xu-ji For segmentation, I found for Postdam in line in data.py, it requires "unlabelled_train", "labelled_train", "labelled_test", but in the paper it said it's unsupervised method, this is confused to me, could you explain it? Also, I don't have the labeled training images, how can I generate the image pairs for training and test dataset on my customize segmentation dataset? Thanks.

xu-ji commented 5 years ago

Images are taken from labelled_train because otherwise you would be not using most of the dataset. Labels are only used for evaluation to find the 1-1 mapping between output clusters and ground truth clusters, not for training.

If you don't have labels, you will train the network but won't be able to quantitatively evaluate it. This means you would be skipping the call to eval.

You don't need labels to generate pairs. You take your whole image, copy it and transform it to create the second image of the pair. The transforms used for Potsdam were jitter and flipping, done here.

RyanCV commented 5 years ago

Thanks. It's clear to me now. As for the training and test images, for the "unlabelled_train", "labelled_train", "labelled_test", what is the ratio? For example, I have 4500 "unlabelled_train" images, and the "labelled_train" is the same with "labelled_test", but only have 500 images, is it ok? Also, did you try depth images only for segmentation?

RyanCV commented 5 years ago

In line, for depth image only, do I need to using config.in_channels = 1 + 2 # depth + sobel, using_IR=False or config.in_channels = 1 # depth only, using_IR=False?Thanks.

xu-ji commented 5 years ago

If you look at the supplementary material (in /paper) table 6 gives the dataset sizes. Unlabelled train + labelled train = 8550 images, labelled test = 5400 images (so labelled train = 5400, unlabelled train = 8550 - 5400). You should be fine with 500 labelled images. The amount of labels required to find the mapping is very low.

We did not try anything with depth.

Your input channels would be 1. You would not use sobel filtering at all, that is a transform for colour images. Because you are working with depth, you may want to consider different transforms to what we used. Jitter is also an operation intended for colour images. You may want to consider trying salt and pepper noise, and flipping (depending on what your images are about), as your transforms.

RyanCV commented 5 years ago

Thanks. It's really helpful.

RyanCV commented 5 years ago

@xu-ji Can your IIC method do instance segmentation?

xu-ji commented 5 years ago

No. That would require some material addition to the method.

MrAcademic commented 5 years ago

what about model ind for our custom dataset? could you please describe model ind and --arch is it possible to change input_sz like 256 for our data or not? for these models?

xu-ji commented 5 years ago

--model_ind is just a name for the experiment, to create the directory to store the results in. Can be anything.

--arch is used to select network architecture. E.g. here.

You could run the scripts on images of 256x256. If you are using your own dataset you will almost certainly need to change the code anyway, if you need to write your own dataloader. To use our existing architectures, the easiest way is just to resize your images to one of the compatible sizes. For example for segmentation our Potsdam images were 200x200. You can find these details by looking at the code or in the supplementary material.

mingha88 commented 5 years ago

Hello, your work is very useful for me .Thanks a lot! could you please tell me how to check the picture like splash.png after running segmentation _twohead.py? I'm trying to use my own dataset and check the result of picture. Do I have to add some codes?

xu-ji commented 5 years ago

If it's your own dataset, probably best to write your own script. It's quite simple, just load your saved network, run your data through the network, get prediction per pixel, map each prediction to a colour.

There are some examples that I used at one point, the render*.py scripts in this dir, where I do exactly this. I use the PIL Image library or matplotlib to turn numpy arrays into images and save to file.

keliive commented 4 years ago

Thanks a lot for the awesome and useful work!

I was wondering about couple of things:

  1. Are the main and auxiliary overclustering heads independent of each other? Say in a fully unsupervised segmentation task where I would want to use the best model from your experiments, which has 2 heads (head_A and head_B), could I just drop the overclustering head head_Afrom the model and use the resulting net with only head head_B as a pretrained net?
  2. Given that the number of ground truth classes in my scenario would be different (say 10 instead of 6), could I simply reinitialize the Conv2d in the IIC head_B (i.e. the main head) from
(head_B): SegmentationNet10aHead(
    (heads): ModuleList(
      (0): Sequential(
        (0): Conv2d(512, 6, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1), bias=False)
        (1): Softmax2d()
      )
    )
)

to

(head_B): SegmentationNet10aHead(
    (heads): ModuleList(
      (0): Sequential(
        (0): Conv2d(512, 10, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1), bias=False)
        (1): Softmax2d()
      )
    )
)

similarly to what is shown here for Sqeezenet?

xu-ji commented 4 years ago
  1. Yes, inference on head B does not need head A in that the outputs are separately interpretable. (Though they share the same trunk, so not independent.) But which head is actually better for your downstream task may need testing to be known.

  2. If you replace head B in a trained network with any randomly initialized new head, it'll need further training for its outputs to be meaningful, either with the IIC loss or some other relevant objective. But yes you could do this.