Closed cbecker closed 9 years ago
First of all, how did you compile Caffe itself? It seems as if you have enabled CUDA compilation there. Remove it if no nVidia GPU is present, this might be one of the issues. Second, make sure what device IDs are available by running: ./caffe_neural_tool --devices and then edit the train.sh to use the correct GPU. At the moment it is GPU no. 3, which might be wrong. Also make sure you use either an nVidia GPU, AMD GPU or non-NUMA processor, as NUMA processors will have cache invalidation issues with the multicore-backend of Caffe. Device fission might fix this in the future but currently, single-CPU systems are way faster than any NUMA (dual CPU) systems. Make sure you compile with ViennaCL but use clBLAS as a BLAS for AMD GPUs, cuBLAS for nVidia GPUs and openBLAS or MKL for CPUs.
You also might want to start training with an SK or U net example, as USK is not very fine tuned (yet).
Thanks. I was looking at the Makefile.config files again, and I realized that I was linking against the wrong cuda version for caffe_neural_tool.
Now, to try it out, which is the easiest example to run, to try training and testing? When I try to run ./tran.sh I get some errors, but I guess there is somewhere a simple example that should work.
One last question: in the ground truth images, is it possible to ignore a particular label. This is to deal with partially labeled data (for example, most pixels in a training image may not be labeled).
Yes this is possible. The value for pixel label to ignore can be set into the network prototxt, in the softmax loss param as "ignore_label". In this case though, don't use the masking parameter provided by my tool, as that will set pixel labels to -1 in order to ignore them.
The easiest example is to run the existing train/process for the SK network. Always make sure the relative paths fit and all folders necessary exist. Just send me the error logs if you can't figure out something on your own.
Many more details though will only be available after the 24th of August after I'm done writing my thesis.
Thanks. For example, I get the following error:
caffe_neural_models/dataset_01 (master ✘)✹ ᐅ bash train.sh
I0717 15:07:19.546991 10671 caffe_neural_tool.cpp:113] Training mode.
F0717 15:07:19.547129 10671 train.cpp:17] Train parameter index does not exist.
*** Check failure stack trace: ***
@ 0x7fa32102da1d google::LogMessage::Fail()
@ 0x7fa32102f8bd google::LogMessage::SendToLog()
@ 0x7fa32102d60c google::LogMessage::Flush()
@ 0x7fa3210301de google::LogMessageFatal::~LogMessageFatal()
@ 0x4a3ed3 caffe_neural::Train()
@ 0x46087d main
@ 0x7fa31f6df76d (unknown)
@ 0x412e49 (unknown)
train.sh: line 1: 10671 Aborted (core dumped) ./../../caffe_neural_tool/build/caffe_neural_tool --gpu 3 --train 4 --proto 'train_process_usk_2.prototxt'
I am not sure where to find the SK network train.sh, so I tried with the first example I found.
Use in dataset_01: --gpu 0 --train 0 --proto 'train_process_sk_9.prototxt'
Then use the snapshot after for example 10'000 steps: --gpu 0 --process 0 --proto 'train_process_sk_9.prototxt'
and make sure there is something to process in a folder called 'input' as I only provide training/ground truth in the repository and no test data.
Adapt the train_process_sk_9.prototxt if necessary. It should be straight forward and the functions can be traced back easily in the code, until my full documentation is available.
that's great, thanks, I will give it a try.
I managed to make it work with the examples. I am trying with my data now, which has labels 0,1 and 255. And I should ignore 255, but use 0 and 1 for a binary problem.
If I exclude the label from the label_consolidate
configuration, will it still try to use it?
I tried with the ignore_label
, which is -1
, but in that case I was getting nan
as loss.
OK I actually never thought of trying it that way - let me figure out where the bug stems from. Maybe I can fix it quickly or give you the correct settings.
thanks, that's great. Also, if you have a paper I can cite for your work, or it could be the thesis afterwards, let me know.
There will maybe be a paper, but that's going to take a while. Until then, you can cite by name and github URL. And after the 24th of August I'll also post a link to the thesis somewhere.
Are your labels integer values in the image? The tool will typically assign 0 to the lowest integer value label, 1 to the next one and so on. In grayscale images! Something seems to be going wrong then. It should map 0 to 0, 1 to 1 and 255 to 2. Best if you exclude label consolidate if you don't use it - just remove the whole block from the processing. And change the ignore_label to the actual label you want to ignore in the neuraltissue_net.prototxt here:
layer {
include: {phase: TRAIN}
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip3"
bottom: "label"
loss_param {
ignore_label: 2
normalize: true
}
}
Also remember to set the correct number of output labels (last layer output number) in both prototxt files. Try with both 2 (what you wish) and 3 (maybe it needs to see the ignored label still).
oh ok I see, I think I am understanding now. Would 255 in the image be taken as label 255, or label 2?
@cbecker You're right, I fixed it now. Also, the debug output shows which label has been seen by training how often. The fact that you see zeros for most labels except one is indicative for something going wrong. NaN stems from division by zero.
How can I check how many labels it is detecting, and how many samples per label? I am training now and my dataset is very skewed, so the loss goes way up, then down:
I0717 18:10:23.551026 7473 solver.cpp:224] Iteration 2200, loss = 25.6082
I0717 18:10:23.551105 7473 solver.cpp:501] Iteration 2200, lr = 0.00086145
I0717 18:10:37.191097 7473 solver.cpp:224] Iteration 2250, loss = 27.6978
I0717 18:10:37.191189 7473 solver.cpp:501] Iteration 2250, lr = 0.000858812
I0717 18:10:50.838942 7473 solver.cpp:224] Iteration 2300, loss = 43.4337
I0717 18:10:50.839028 7473 solver.cpp:501] Iteration 2300, lr = 0.000856192
I0717 18:11:04.493456 7473 solver.cpp:224] Iteration 2350, loss = 32.9644
I0717 18:11:04.493532 7473 solver.cpp:501] Iteration 2350, lr = 0.000853591
I0717 18:11:18.124866 7473 solver.cpp:224] Iteration 2400, loss = 49.9584
I0717 18:11:18.124938 7473 solver.cpp:501] Iteration 2400, lr = 0.000851008
I0717 18:11:31.758568 7473 solver.cpp:224] Iteration 2450, loss = 33.9239
I0717 18:11:31.758651 7473 solver.cpp:501] Iteration 2450, lr = 0.000848444
I0717 18:11:45.394291 7473 solver.cpp:224] Iteration 2500, loss = 13.3905
I0717 18:11:45.394366 7473 solver.cpp:501] Iteration 2500, lr = 0.000845897
I0717 18:11:58.996084 7473 solver.cpp:224] Iteration 2550, loss = 52.0906
I0717 18:11:58.996172 7473 solver.cpp:501] Iteration 2550, lr = 0.000843368
I0717 18:12:12.502898 7473 solver.cpp:224] Iteration 2600, loss = 50.3209
I0717 18:12:12.502974 7473 solver.cpp:501] Iteration 2600, lr = 0.000840857
I0717 18:12:25.900024 7473 solver.cpp:224] Iteration 2650, loss = 21.3011
I0717 18:12:25.900107 7473 solver.cpp:501] Iteration 2650, lr = 0.000838363
I0717 18:12:39.306751 7473 solver.cpp:224] Iteration 2700, loss = 4.43506
I0717 18:12:39.306835 7473 solver.cpp:501] Iteration 2700, lr = 0.000835886
I0717 18:12:52.684272 7473 solver.cpp:224] Iteration 2750, loss = 85.4389
I0717 18:12:52.684355 7473 solver.cpp:501] Iteration 2750, lr = 0.000833427
I0717 18:13:06.066045 7473 solver.cpp:224] Iteration 2800, loss = 11.0876
I0717 18:13:06.066126 7473 solver.cpp:501] Iteration 2800, lr = 0.000830984
I0717 18:13:19.453717 7473 solver.cpp:224] Iteration 2850, loss = 28.2095
I0717 18:13:19.453801 7473 solver.cpp:501] Iteration 2850, lr = 0.000828558
I0717 18:13:32.837424 7473 solver.cpp:224] Iteration 2900, loss = 5.7144
I0717 18:13:32.837505 7473 solver.cpp:501] Iteration 2900, lr = 0.000826148
I0717 18:13:46.211272 7473 solver.cpp:224] Iteration 2950, loss = 4.69093
I0717 18:13:46.211349 7473 solver.cpp:501] Iteration 2950, lr = 0.000823754
Enable debugging for the caffe_neural_tool if you want to see the label counts. add --debug and --graphic if you want to see a bit of what's going on. Maybe you should also decrease the learning rate (0.0001) and increase momentum (0.99) in the neuraltissue_solver.prototxt for this data set.
Feel free to fork and add new stuff to this tool if it helps you. I will review and accept pull requests.
Thanks ;)
I'm confused, because I see as if label 1 and 2 are there, but 0 is not:
Label: 0, 0
Label: 1, 6534
Label: 2, 574
and my labels TIF file has values 0,1 and 255.
If you enable the patch prior and masking function, what initial statistics does the tool output when starting the training?
Ohh I think I see what I did there. Label 0 is actually -1 if masking is enabled. Soo... label 1 is your label 0 and label 2 is your label 1 in that statistics and all is fine. I need to fix that when I find time. I also assume you used label count 2 instead of 3. Otherwise you'd see a Label: 3, xxx entry where xxx is the number of labels 255 you have in the image.
I suggest you set the number of labels to 3 in the tool and to 2 in the networks. This should, according to my code review now, fix all issues.
The actual code snippet responsible as proof:
// TODO: Only enable in debug or statistics mode
for (int y = 0; y < patch_size; ++y) {
for (int x = 0; x < patch_size; ++x) {
labelcounter[patch[1].at<float>(y, x) + 1] += 1;
}
}
Hi, Thanks.
Actually, if I set the number of classes to 3 the training loss never goes down. So I am using 2. But then, if I disable CLAHE normalization, it doesn't converge either.
do you have any idea of what could be happening? The config files are here: http://pastebin.com/L0NphqbP
EDIT: it seems to be that masking: false
solves it.
However, what is the effect of histeq in that case? I want to avoid histogram equalizing the patches or images.
thanks
Yes you should disable masking on your dataset. Remove the histogram equalization block. Histeq turned out to be useful on the ISBI 2012 dataset and also on FlyEM data with 9 labels. If the data is already only partially masked or labelled for some reason, the histogram equalization will interfere heavily.
I apologize for the difficulty to set it up currently. The parameters need to be vastly different on different kinds of datasets.
Thanks. When disabling histeq, I get nan after 50 iterations. Do you think this is related to patch normalization and learning rate?
I0718 23:03:19.342078 10622 solver.cpp:224] Iteration 0, loss = 0.693077
I0718 23:03:19.342238 10622 solver.cpp:501] Iteration 0, lr = 0.001
I0718 23:03:29.735518 10622 solver.cpp:224] Iteration 50, loss = nan
I0718 23:03:29.735569 10622 solver.cpp:501] Iteration 50, lr = 0.000996266
I0718 23:03:40.074090 10622 solver.cpp:224] Iteration 100, loss = nan
I0718 23:03:40.074154 10622 solver.cpp:501] Iteration 100, lr = 0.000992565
I0718 23:03:50.432718 10622 solver.cpp:224] Iteration 150, loss = nan
I0718 23:03:50.432777 10622 solver.cpp:501] Iteration 150, lr = 0.000988896
I0718 23:04:01.384943 10622 solver.cpp:224] Iteration 200, loss = nan
I0718 23:04:01.385005 10622 solver.cpp:501] Iteration 200, lr = 0.000985258
Are there patches in your dataset that lack labels 0 and 1 completely and only expose label 2, or any other such odd combination? If that is the case this might well be a problem because the Caffe library might divide by zero at some point, namely when normalizing the loss by the amount of labels present, which would be 0 when only label 2 is seen. In that case, keeping patch_prior to true and masking to false should not give NaN, is that correct?
exactly, some parts have only label 2 (ignore).
yes, if I put patch_prior to true and masking to false, it seems to work. will it do histogram eq. if masking = false? I want to make sure I disable histogram eq. as I think it is hurting in my case.
It will prioritize patches with rare labels if patch prior is enabled. This will equalize the histogram of labels slightly but not completely. It could be that this even sets the priority of patches with label 2 only to zero. But beware, I did not exactly test this behavior.
If you want to be very sure, then the best thing would be to either fix the loss function in Caffe to avoid division by zero and then disable the patch prior again, or otherwise fix the neural tool to not expose Caffe to patches with invalid/ignored labels only.
I could fix the loss function if you want.
Actually fixed it just now in my Caffe branch if you want to try again. Just make sure to compile both Caffe and the Tool again.
I see, I am starting to understand better now.
ok great. So I should just disable the histeq block?
Yes remove it and try again with the updated code.
Fix:
// Fix the division by zero bug
if (count > 0) {
caffe_gpu_scal(prob_.count(), loss_weight / count, bottom_diff);
}
count was zero here with your labels, making the loss go NaN.
actually now it still throws nan
, now it does it at the beginning already.
let me know if I can help with any other debugging info.
I can also send you an example image and files, if that helps.
I tried something else now. I don't know if that would help as I pretty much know where the error comes from. It really is the division by zero or generally zero valid labels present, which is not handled by Caffe currently.
Thanks, though it's still not working. In case it helps, here you have a working example for training: http://cvlabwww.epfl.ch/~cjbecker/tmp/test.tar.gz
I used it like that now, with the newest version of Caffe & the tool, and not getting NaN anymore:
# The training protocol buffer definition
train_net: "neuraltissue_net.prototxt"
########################################################################
# The testing protocol buffer definition
# test_net: "../net_sk_2out/neuraltissue_net.prototxt"
########################################################################
# Test_iter specifies how many forward passes the test should carry out.
# it is the number of batches shown, then
# examples shown = 'test_iter'*batch_size
# Carry out testing every 'test_interval' training iterations.
# test_iter: 1000
# test_interval: 500
########################################################################
# The base learning rate, momentum and the weight decay of the network.
# base_lr: 0.05
base_lr: 0.001
momentum: 0.9
weight_decay: 0.0005
########################################################################
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
#lr_policy: "step"
#gamma: 0.1
#stepsize: 20000
########################################################################
# The maximum number of iterations
max_iter: 100000
########################################################################
# Snapshot intermediate results
snapshot: 2000
snapshot_prefix: "neuraltissue_sk_2out"
########################################################################
# Display every 'display' iterations
display: 5
########################################################################
train {
# solverstate: "neuraltissue_sk_2out_iter_16000.solverstate"
solver: "neuraltissue_solver.prototxt"
input {
padding_size: 102
patch_size: 64
channels: 3
labels: 3
batch_size: 1
raw_images: "train/raw2"
label_images: "train/gt2"
preprocessor {
normalization: true
rotation: true
mirror: true
clahe {
clip: 4.0
}
crop {
imagecrop: 1
labelcrop: 0
}
histeq {
patch_prior: true
masking: false
}
}
}
}
process {
process_net: "neuraltissue_net.prototxt"
# caffemodel: "neuraltissue_sk_2out_iter_100000.caffemodel"
input {
padding_size: 102
patch_size: 128
channels: 3
labels: 2
batch_size: 1
raw_images: "input"
preprocessor {
normalization: true
clahe {
clip: 4.0
}
crop {
imagecrop: 1
labelcrop: 0
}
}
}
filter_output {
output_filters: false
output: "sk_filters"
}
output {
format: "tif"
fp32_out: false
output: "output"
}
}
and:
layer {
include: {phase: TRAIN}
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip3"
bottom: "label"
loss_param {
ignore_label: 2
normalize: true
}
}
Now you'll see 0 loss if it only sees label 2. Also that could mess a bit with the momentum and how fast it converges, so I'd still keep the histogram equalization (without masking) on.
Now it's up to you to get it to train nicely, which might not be easy.
Thanks, that's great! yes, I see about loss 0, I will give it a try and let you know, thanks again!
one more question: it is strange to me that the histogram equalization block is only in the training part. Does this mean to histogram equalize the input patch? because then I would expect to see it in the predict part as well.
It only prioritizes picking of training patches during training. Histogram equalization and masking has only an effect on what kind of errors the network will be seeing how often during SGD.
In prediction mode, it will just go through the patches linearly and give a prediction for each pixel.
I think it is working now, thanks.
What I notice is that the output probabilities are very 'thick' (in my case I have synapses). Typically I would expect to see a transition zone from the border of the synapse, to the outside, with decaying probability. Have you experienced this issue as well? also, is it possible to output, instead of the probability output of the last sigmoid, the input to that sigmoid? In the case of a two class problem one typically plots the classifier score, before reaching the sigmoid, as the sigmoid may squeeze things too much (especially for unbalanced training data like this one).
Yes, that might happen.
You can just remove the Softmax/Sigmoid in the end which is labelled with a phase: TEST include clause at the end of the network. In this case, it should pick up the pre-softmax output in the tool.
Remove this block in the network prototxt:
layer {
include: {phase: TEST}
name: "prob"
type: "Softmax"
bottom: "ip3"
top: "prob"
}
Great, that works.
I talked to Jan and he showed me your results for the dataset at https://github.com/unidesigner/groundtruth-drosophila-vnc , which look very good
Do you have the parameters you used to train that network? and how many iterations approximately?
The best results I have on ISBI 2012 now are with: Network: SK Training: Softmax + Malis (loss function, 10'000 iterations each) The previously best results were on SK with Softmax loss only, approximately 30'000 iterations is what we used. I think it's about the same for the other dataset. But Malis can only be used properly if label 0 is background that separates objects.
Thanks. I did some tests, and performance is still a bit low. I was trying to see the filters learned, to see if something weird is happening, but I am getting an error while loading the model and parameters in python:
IndexError Traceback (most recent call last)
<ipython-input-4-5f4574a759ce> in <module>()
----> 1 net = caffe.Classifier("net_sk_2out/neuraltissue_net.prototxt", "hipp/neuraltissue_sk_2out_iter_200000.caffemodel" )
/home/cjbecker/filer/jan-caffe/caffe/python/caffe/classifier.pyc in __init__(self, model_file, pretrained_file, image_dims, mean, input_scale, raw_scale, channel_swap)
27
28 # configure pre-processing
---> 29 in_ = self.inputs[0]
30 self.transformer = caffe.io.Transformer(
31 {in_: self.blobs[in_].data.shape})
IndexError: list index out of range
I suppose this is an issue with patch size and input layers, but I am not sure how to solve it.
Do you want to see the individual filtered stages after each layer or the filter kernels themselves? The first point can be done in the tool itself (output_filters parameter). The second is not implemented (yet).
In any case, you cannot do forward/backward processing with the python interface, the memory data interface used in the networks only work together with the C++ interface currently. You'd have to use python data layers to do so, and change the networks.
I meant to show the filter kernels, as they would look like noise if there is an issue during learning. If there was a way to extract the coefficients from the model file, then that would be enough.
I can also try to look at the output of the first layer, that could give me a hint. This is done by removing or commenting out the other layers, right?
No, you can output everything during processing by using:
filter_output {
output_filters: true
output: "sk_filters"
}
inside the process {} block of train_process_sk_2.prototxt
It slows everything down though, as this writes many thousands of images to the harddrive.
ah great, thanks ;)
another observation from looking at the test output: I think normalization is not working very well, because, depending on the slice, there is a shift in the output of the network, and it is causing the output to vary too much between consecutive slices.
If normalization is disabled, then the code here https://github.com/naibaf7/caffe_neural_tool/blob/31c5a2c635062e9e0887719f20a77f674ee9c709/src/image_processor.cpp#L70 doesn't make the pixels go between -1 and +1, but instead between 0 and 1, right? I am going to play with this because the variations in the output are significant with the current min/max scheme.
Yes, you should definitely check what min/max normalization and CLAHE does to your data, and whether it helps or hurts the classification.
For reference: Currently, the ISBI dataset with CLAHE and Min/Max normalization scores 12th on my implementation: http://brainiac2.mit.edu/isbi_challenge/leaders-board (the INI entry)
Great, thanks.
I tried on my dataset (synapses) and I get ok results, though much worse than a random forest trained on some custom simple features. When testing on the training images, I see that there are negatives that are misclassified as positives, probably because they have little chance to appear during SGD. Have you ever tried hard negative mining? or is there a way to favor sampling some samples (or regions) more than others?
You'd probably have to change to minibatch training instead of patch training. As it is currently, to save computations and speed up, it always trains SK with 64 by 64 pixels that are not sampled i.i.d. but correlate very strongly (they are in the same local patch). Even histogram equalization (patch prior and masking) cannot correct this issue completely.
Depending on the dataset, this gives worse results, as you noticed, and in this case you need to prepare a HDF5 dataset with the samples as you wish them to be picked (i.i.d. from a distribution that you think will result in better training). Then you need to swap out the MemoryDataLayer for a HDF5 data layer and change the input size to 102x102, label size to 1x1 and batch size to 256 and train it without my tool (which is for patch training rathern than minibatch training).
Afterwards, processing can happen again with a patch label size of 128x128 and the speedup is given again. The results will be numerically identical to batch processing.
I see. but it would also be possible to modify the histeq module and the patch_prior, to weight certain patches more than others. Then I could have an image that has a weight per pixel, that 'guides' learning, I think that could be a first approximation to achieve this.
Yes there are actually many things I would have liked to include in the tool, but currently do not have time to program:
One issue with additional weight maps is that they have to be passed into the caffe library as an additional MemoryDataLayer and the SoftmaxLoss has to be modified to accept such a map as additional bottom blob.
Minibatch support would be the easiest to implement without having to change too much.
Hi there. I am done making a few experiments, thanks for all the support.
I managed to implement loading the external 'weight map' for sampling, which helped a bit in some cases.
In terms of performance, the CNNs do much worse than our approach, even when trained on the largest training set we have. I don't think this is an issue of your implementation, as we saw this when using Caffe as well. To get better performance I think we need to add some prior to the network.
Just to be sure, the sk_2 network we talked about, included in the examples, has an 'equivalent patch size' or 'context' of 100x100 pixels, right? I mean, it would be equivalent to running a per-patch trained CNN whose input layer is of 100x100.
okay, interesting... the context is 102 by 102
Hi, first thanks for this great software :)
I am trying to run the examples, but I am getting runtime errors, namely:
I suppose there is an issue with cublas, but I am not sure how to fix this. I am linking against ViennaCL, but I am not sure where cublas comes from.
do you have any suggestions about what to try next? thanks!