phillipi / pix2pix

Image-to-image translation with conditional adversarial nets
https://phillipi.github.io/pix2pix/
Other
10.11k stars 1.71k forks source link

How to do a 'brain scan' on cGAN #23

Open kaihuchen opened 7 years ago

kaihuchen commented 7 years ago

What would be the best way to tap into the Generator in order to inspect or manipulate the high-level features created in it? FYI, if I apply cGAN on images of human faces, then I expect that eventually the Generator will create high-level features similar to typical deep CNNs would, perhaps at the level of noses and eyes, etc. This is borne out in my experiment with facial images, where I am able to observe whole eye being scaled and moved around by the Generator during training.

This is of great interest to me, because this amounts to unsupervised learning of the facial parts (i.e., eyes, noses, etc.), since cGAN would then kind of know that a nose in photo A and another nose in photo B are sort of the same type, if both actually activate the same 'nose neuron.' This is a big deal to me, because if this is in fact the case, then once a text label is attached to one nose in one photo (through supervised learning), then we would know right way that the 'nose' label is likely applicable to all those other noses in other photos (kind of a type of one-shot learning isn't it).

So my question here boils down to: how can I detect (after training) that two noses from two photos actually activates the same neuron (or not) in cGAN? I thought that this is somewhat similar to doing fMRI scan on human brain, thus the 'brain scan' analogy.

Needless to say, Lua is quite new to me so any pointer to my question above would be very much appreciated.

kaihuchen commented 7 years ago

Or to put it more simply, does any one know of a good tool for visualizing the hidden units in this cGAN implementation?

johndpope commented 7 years ago

you maybe able to use this tensorflow port https://github.com/johndpope/pix2pix-tensorflow tensorflow has tensorboard that may help shed some light on this. https://www.tensorflow.org/how_tos/summaries_and_tensorboard/

Quasimondo commented 7 years ago

I think you might be able to reuse the display server for this. It seems to have a built-in method to visualize tensors:

display.image(tensor, options)

Displays the tensor as an image. The tensor is normalized (by a scalar offset and scaling factor) to be displayable. The image can be panned around and zoomed (with the scroll wheel or equivalent). Double-click the image to restore original size or fill the window.

If the tensor has 4 dimensions, it is considered to be a list of images -- sliced by first dimension. Same thing if it has 3 dimensions but the first dimension has size more than 3 (so they cannot be considered the RGB channels). This is equivalent to passing a list (Lua table) of tensors or the explicit images command. This is convenient when visualizing the trained filters of convolutional layer.

https://github.com/szym/display

Quasimondo commented 7 years ago

I made a quick test and it seems to work. If you insert this in the display update condition you will have your screen filled quickly :-):

local idx = 3
for indexNode, node in ipairs(netG.forwardnodes) do
    local tn = torch.typename(node.data.module)
    if tn=='cudnn.SpatialConvolution' or tn=='cudnn.SpatialFullConvolution' then
        disp.image(node.data.module.output[1], {win=opt.display_id+idx, title=idx })
        idx = idx + 1
    end
end