Closed arjunrajanna closed 9 years ago
Do you want to train a network for segmentation? In that case, from the vl_nnsoftmaxloss documentation:
C contains the class labels, which should be integer in the range
1 to D. C can be an array with either N elements or with H x W x
1 x N dimensions. In the fist case, a given class label is
applied at all spatial locations; in the second case, different
class labels can be specified for different locations.
So if you have an output map of size H x W x D (instead of 1 x 1 x D for classification) you can just provide your label as a H x W x 1 segmentation mask containing labels in the range 1 to D.
Sorry to jump on this issue but I am also very interested in this question.
@dbbert: I saw the quoted text in the documentation when I first started using matconvnet. But I am still trying to understand what type of network architecture would be required to achieve pixel-wise segmentation using a H x W x 1 label mask for the softmaxloss layer. Would a "U" architecture be required, whereby input dimensionality is first gradually reduced similar to a typical CNN designs (i.e. funnel-like architecture) before being gradually increased to recover the original image's dimensions (minus border pixels loss due to convolution)?
@nicjac: It's certainly possible to make an architecture like you describe, but it will probably be easier to start from a network that is trained for classification and fine-tune it for segmentation, as in Fully Convolutional Networks for Semantic Segmentation (CVPR 2015). In the paper they "convert" the fully connected layers of the classification net to convolutional layers, so they can apply the net to any input size instead of only 227 x 227 images.
If you have a look at cnn_imagenet_init.m, you'll see that the higher layers are already implemented as 1x1 convolutions, so you don't even need to worry about any conversion. If you apply the net to an image that is larger than 227 x 227, you will simply get a H x W x 1000 output map instead of a 1 x 1 x 1000 output. Note that the stride of the output map will be 32 though, if you don't apply any tricks like in the paper.
Hi, building on this, we are working on a similar network at the moment. Hopefully we will soon be able to release some of the improvements to make this easier.
On 19 Jun 2015, at 10:24, Bert notifications@github.com wrote:
@nicjac https://github.com/nicjac: It's certainly possible to make an architecture like you describe, but it will probably be easier to start from a network that is trained for classification and fine-tune it for segmentation, as in Fully Convolutional Networks for Semantic Segmentation http://www.cs.berkeley.edu/%7Ejonlong/long_shelhamer_fcn.pdf (CVPR 2015). In the paper they "convert" the fully connected layers of the classification net to convolutional layers, so they can apply the net to any input size instead of only 227 x 227 images.
If you have a look at cnn_imagenet_init.m, you'll see that the higher layers are already implemented as 1x1 convolutions, so you don't even need to worry about any conversion. If you apply the net to an image that is larger than 227 x 227, you will simply get a H x W x 1000 output map instead of a 1 x 1 x 1000 output. Note that the stride of the output map will be 32 though, if you don't apply any tricks like in the paper.
— Reply to this email directly or view it on GitHub https://github.com/vlfeat/matconvnet/issues/175#issuecomment-113444394.
@dbbert That's a very interesting paper, thanks. My issue is that the application I have in mind does not have a previously trained classification network (biomedical image segmentation). What approach would you recommend in this situation?
@vedaldi Sounds good! Which type of network would that be? The one described by @dbbert?
Hi all, Thank you for such great replies.
@dbbert I do want to do a segmentation, and recently just glanced the paper you mentioned, and will go back and take a look at it again.
@vedaldi It sounds good, as @nicjac mentioned I wanted to apply this to a medical imaging setting as well, hence cannot use a pretrained network.
Hi @dbbert , I am working on that paper. As you mentioned - "If you apply the net to an image that is larger than 227 x 227, you will simply get a H x W x 1000....", I guess the pre-trained imagenet model (caffe-ref/alexnet) expects inputs to be 256x256. That said, as long as the input size is larger than 256x256, we can make HxWx1000 predictions with the model. Please correct me if I am wrong
Hi @atique81, I think that's correct.
2015-08-06 5:50 GMT+02:00 atique81 notifications@github.com:
Hi @dbbert https://github.com/dbbert , I am working on that paper. As you mentioned - "If you apply the net to an image that is larger than 227 x 227, you will simply get a H x W x 1000....", I guess the pre-trained imagenet model (caffe-ref/alexnet) expects inputs to be 256x256. That said, as long as the input size is larger than 256x256, we can make HxWx1000 predictions with the model. Please correct me if I am wrong
— Reply to this email directly or view it on GitHub https://github.com/vlfeat/matconvnet/issues/175#issuecomment-128227482.
Thanks @dbbert .
However, could you please advise how I can initialize the weights of a deconvolution layer to bilinear interpolation, as done in that paper using Caffe ?
I would highly appreciate your response.
Hi all,
I understand that you can obtain the probabilites by replacing the 'softmax loss' by 'softmax' at the end on a sample by sample basis.
However, if i were to do the same on a pixel by pixel basis for all samples(train,val), is there a way that I could fine tune the matconvnet to do it?
Thank you in advance for the help.