Open walkerlala opened 6 years ago
You could modify the code here so that the exclude_list only includes the `_LOGITS_SCOPE_NAME' and also set the flag initialize_last_layer = False. (Note you still want to restore the variables in ASPP, decoder and so on). By doing so, only the weights in the last classification layer is not initialized (then you could use a classification layer with 150 classes).
You need to explore the min_resize_value and max_resize_value (set resize_factor = output_stride) for ADE20K which contains images of huge various scales (e.g., dimension ranges from 50 to 2000). In that case, by setting min_resize_value and max_resize_value, you are able to resize the images on-the-fly to the similar range (or you could do that manually by yourself while pre-processing the dataset). Note however these hyper-parameters may affect the performance, and we have not yet explored that carefully.
@aquariusjay Thanks for the hints. Now I have started the training, using the provided VOC model checkpoint, setting fine_tune_batch_norm
to False, using the mobilenet_v2
variant and a batch size of 8. Hopefully that the loss will drop after several hours...
There are still two things confusing me:
the segmentation annotation images within the ADE20K dataset have trhee channels, but I am reading it with label_reader = build_data.ImageReader('png', channels=1)
, as for what we have done for the VOC dataset (in datasets/build_voc2012_data.py
). Will that be a problem?
why do we have the resize_factor
parameters?
Oh, will it be OK to prepare a pull request for the ADE20K dataset?
Regarding your previous questions:
We currently do not have any plan to prepare that. However, note that one should be able to do that by using the provided code/model/script. Also, any contributions for extra dataset to the codebase is welcome.
Cheers,
@aquariusjay,
I'm currently having similar issues attempting to train with a custom dataset and was hoping you could offer some insight.
You could modify the code here so that the exclude_list only includes the `_LOGITS_SCOPE_NAME' and also set the flag initialize_last_layer = False.
The link you included "here" appears to need a Google SSO to login. I am assuming that was a link to the _trainutil.py script. Here are the changes I have currently made to implement your architecture on my custom dataset:
_TOY_DATASET_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 800,
'trainval': 1000,
'val': 200,
},
num_classes=10,
ignore_label=255,
)
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'toy_dataset': _TOY_DATASET_INFORMATION,
}
flags.DEFINE_boolean('initialize_last_layer', False,
'Initialize the last layer.')
flags.DEFINE_string('dataset', 'toy_dataset',
'Name of the segmentation dataset.')
exclude_list = ['_LOGITS_SCOPE_NAME']
if not initialize_last_layer:
exclude_list.extend(last_layers)
flags.DEFINE_string('dataset', 'toy_dataset',
'Name of the segmentation dataset.')
However, when I run this my code appears to successfully train, but then running into an issues with the the confusion matrix during evaluation (I include the traceback below for reference). Any tips/suggestions on how to fix this?
Thanks for your help! Brett
Error Traceback:
~/brett/wss-python/models/research/deeplab$ sh local_test_custom.sh
Converting toy dataset...
>> Converting image 50/200 shard 0
>> Converting image 100/200 shard 1
>> Converting image 150/200 shard 2
>> Converting image 200/200 shard 3
>> Converting image 250/1000 shard 0
>> Converting image 500/1000 shard 1
>> Converting image 750/1000 shard 2
>> Converting image 1000/1000 shard 3
>> Converting image 200/800 shard 0
>> Converting image 400/800 shard 1
>> Converting image 600/800 shard 2
>> Converting image 800/800 shard 3
--2018-03-30 12:33:03-- http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.217.8.176, 2607:f8b0:4009:80d::2010
Connecting to download.tensorflow.org (download.tensorflow.org)|172.217.8.176|:80... connected.
HTTP request sent, awaiting response... 416 Requested range not satisfiable
The file is already fully retrieved; nothing to do.
toy_dataset
INFO:tensorflow:Training on trainval set
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/losses/losses_impl.py:731: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See tf.nn.softmax_cross_entropy_with_logits_v2.
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
INFO:tensorflow:Ignoring initialization; other checkpoint exists
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py:736: __init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
INFO:tensorflow:Restoring parameters from /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt-11
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 11.
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
toy_dataset
INFO:tensorflow:Evaluating on val set
INFO:tensorflow:Performing single-scale test.
INFO:tensorflow:Eval num images 200
INFO:tensorflow:Eval batch size 1 and num batch 200
INFO:tensorflow:Waiting for new checkpoint at /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train
INFO:tensorflow:Found new checkpoint at /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt-12
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/training/python/training/evaluation.py:303: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt-12
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting evaluation at 2018-03-30-16:35:58
Traceback (most recent call last):
File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 175, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 168, in main
eval_interval_secs=FLAGS.eval_interval_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 301, in evaluation_loop
timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/training/python/training/evaluation.py", line 452, in evaluate_repeatedly
session.run(eval_ops, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 546, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1022, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1170, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 950, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [255 255 255...] [y (mean_iou/ToInt64_2:0) = ] [10]
[[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2)]]
Caused by op u'mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert', defined at:
File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 175, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 142, in main
predictions, labels, dataset.num_classes, weights=weights)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/metrics_impl.py", line 1009, in mean_iou
num_classes, weights)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/metrics_impl.py", line 263, in _streaming_confusion_matrix
labels, predictions, num_classes, weights=weights, dtype=dtypes.float64)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/confusion_matrix.py", line 183, in confusion_matrix
message='`predictions` out of bound')],
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/check_ops.py", line 579, in assert_less
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 118, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 177, in Assert
guarded_assert = cond(condition, no_op, true_assert, name="AssertGuard")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2027, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1868, in BuildCondBranch
original_result = fn()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 175, in true_assert
condition, data, summarize, name="Assert")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 48, in _assert
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [255 255 255...] [y (mean_iou/ToInt64_2:0) = ] [10]
[[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2)]]
train_utils.py
• I modify the code here so that the exclude_list only includes the `_LOGITS_SCOPE_NAME', as you stated above.
exclude_list = ['_LOGITS_SCOPE_NAME'] if not initialize_last_layer: exclude_list.extend(last_layers)
this should be
exclude_list = [_LOGITS_SCOPE_NAME]
That is, _LOGITS_SCOPE_NAME is a variable defined else where (search for it)
@walkerlala
I am trying to train the deeplab model with the ADE20k datasets.
I'm having some problem with data format conversion.
Would you mind sharing the code for ADE20k datasets? It would be really appreciated.
@brett-whitford When I use my data .I have the same error with you . Can you share your solution? Thank you very much .I 'm looking forword to your reply
@wonderit Of course. Please wait for a while until I have access to my GPU server.
@wonderit Here is the patch for converting training data and training deeplabv3 on ADE20K.
https://gist.github.com/walkerlala/82d978e68407e65158e8825cd470d7e1
(it can also be found at http://fastdrivers.org/misc/patch-for-ade20k.patch )
You can apply this patch on top of commit 1d38a22535866f2e19a4eb0fc623fa768fb08dcf or 5281c9a028f6fc344357c2c9e0c06c171e16dfa4 without conflict.
Note:
you can to manually adjust the path in train_ade20k.py
for training and supply correct path of the training data for converting the data, as documented in the doc
training data can be found at: http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
I am also going to submit a PR to get these into the repo. However, I don't have enough GPU to get a good pretrained model (only get two Nvidia 1080...) If you can obtain a decent pretrained model, please share!
Also, anyone interested in add ADE20K to deeplabv3 can take a look at this PR I just created: https://github.com/tensorflow/models/pull/3853
@walkerlala When use val.py, did you have the error 'predictions' out of bound?just same with the @brett-whitford ' question. Thank you
@walkerlala Can you share your eval script?
@walkerlala @aquariusjay
Hi, I am confused about the exclude_list
and initialize_last_layer
.
I am not sure whether I understand it correctly:
If one want to fine-tune deeplab-v3+ on another dataset, only _LOGITS_SCOPE_NAME
need to be excluded?
If so, following @aquariusjay 's suggestion, in "train_utils.py":
exclude_list = [_LOGITS_SCOPE_NAME]
if not initialize_last_layer:
exclude_list.extend(last_layers)
if set initialize_last_layer=false
, then exclude_list
will include the last_layers
. In "train.py" last_layers
is the list [_LOGITS_SCOPE_NAME, _IMAGE_POOLING_SCOPE, _ASPP_SCOPE, _CONCAT_PROJECTION_SCOPE, _DECODER_SCOPE, ]
.
So all variables in the list will be excluded. This seems inconsistent.
Shouldn't it be the following?
initialize_last_layer=true
and exclude_list = [_LOGITS_SCOPE_NAME]
Hi, I'm training on my own dataset as well (only two classes).
When I set initialize_last_layer=false
and
exclude_list = ['logits']
if not initialize_last_layer:
exclude_list.extend(last_layers)
Then when I run vis.py, it gives me all black images (not binary).
When I only set initialize_last_layer=false
, I got binary images (result is not good, but at least show some learning). But it gives me this when run train.py:
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 6390723.
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
when training_number_of_steps=100000
Anyone knows why this happens? Thanks!
@lydialixia
Hello.
You should add 'global_step'
in exclude_list
:
exclude_list = ['global_step']
But I am still confused about whether one should set initialize_last_layer=false
when to fine-tune deeplab-v3+ on another task.
When you want to fine-tune DeepLab on other datasets, there are a few cases:
You want to re-use ALL the trained weigths: set initialize_last_layer = True (last_layers_contain_logits_only does not matter in this case).
You want to re-use ONLY the network backbone (i.e., exclude ASPP, decoder and so on): set initialize_last_layer = False and last_layers_contain_logits_only = False.
You want to re-use ALL the trained weights EXCEPT the logits (since the num_classes may be different): set initialize_last_layer = False and last_layers_contain_logits_only = True.
Hi @walkerlala: did you manage to finetune the ADE20K dataset? I'm trying to finetune on a dataset of the same size, but without success: after the first ~2K iterations the loss stops to decrease and starts to oscillate (~20K iterations). I tried different learning rates, removed the regularization, but for the moment no improvement.
@georgosgeorgos No I can't eventually fine tune the model on ADE20K dataset. I don't have enough GPU. Every time I try to fine tune the batch normalization parameters the model blow up throwing out out-of-memory error. So I freeze the batch normalization layers when training. Finally I only got a model with only "modest" performance:
Here is the original image (too large to display here): http://www.fastdrivers.org/misc/stuffseg-origin.jpg
Here is the segmentation result:
However I can get a satisfying result with PSPNet:
According to the slides from the 2017 Coco + Places Workshop, deeplabv3 should also be able to do that, but I haven't got any luck to fine-tune that. Hopefully Google can provide a fine-tuned pre-trained model in the future @aquariusjay .
@brett-whitford - Hi Brett, I am having the exact same problem as you. How did you end up solving it?
@shipeng-uestc - Hi shipeng, did you manage to solve the issue? I am currently using exclude_list=[_LOGITS_SCOPE_NAME]
with _LOGITS_SCOPE_NAME imported from deeplab.model as @walkerlala suggested but I am still having the same error as Brett.
when I run python deeplab/eval.py --logtostderr --eval_split="val" --model_variant="xception_65" --atrous_rates=6 --atrous_rates=12 --atrous_rates=18 --output_stride=16 --decoder_output_stride=4 --eval_crop_size=513 --eval_crop_size=513 --dataset="ade20k" --checkpoint_dir="./deeplab/datasets/ADE20K/exp/train_on_train_set/train" --eval_logdir="./deeplab/datasets/ADE20K/exp/train_on_train_set/eval" --dataset_dir="./deeplab/datasets/ADE20K/tfrecord"
NotFoundError (see above for traceback): Key aspp1_depthwise/BatchNorm/beta not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] [[Node: save/RestoreV2/_299 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] please help me !!!thanks
@hhwxxx Hello, in your answer to lydialixia, do you mean in train_util.py, exclude_list should be like this: exclude_list = ['global_step'] exclude_list = ['logits']
but I still can't start training, the information is: INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:Recording summary at step 30000. INFO:tensorflow:Stopping Training. INFO:tensorflow:Finished training! Saving model to disk.
I have also tried exclude_list = ['_LOGITS_SCOPE_NAME'], this doesn't work. When just set exclude_list = ['global_step'], the model will achieve mean iu = 0.93 after 10000 iteractions, I don't know whether this is wrong. Waitting online, thank you!
@qmy612
Hello. Maybe you can try this:
exclude_list = ['global_step', 'logits']
As to the _LOGITS_SCOPE_NAME
, it is defined in "model.py", so you should use like this: model._LOGITS_SCOPE_NAME
.
And I have no idea about miou=0.93
.
Just set set initialize_last_layer = False and last_layers_contain_logits_only = True
works for me, if you wanna train on your own dataset with different num classes.
@BeSlower , yes, the solution is work for me but there is another problem that the result is all black and no other label , but during the training process , the loss is decrease. Can anyone help me ?
@qmy612 Did you get the problem solved? I am having the exacting problem as you
@xiangjinwu Yes, the answer of hhwxxx is work. exclude_list = ['global_step', 'logits']
@aquariusjay Hello,I train my own dataset which has only one class(exclude unlabeled)and has the same style with the cityscapes on deeplab,but some problems usually happen. One is the server always restart when training. Another is the result is only one color of the class I labeled. Can you give me some advice?Thanks.
@qmy612 Thx a lot, It works
@Soulempty, Regarding your questions:
@aquariusjay Thank you for your detailed solution,I want give you more details about my problems. 1、My dataset is modified as the style of Cityscapes,but have only one class("road"),so the ground truth label only have road pixel and ground pixel(not be labelled). 2、The follow is my ground truth label. 3、The follow is my json label. {"imgWidth": 1280, "imgHeight": 1080, "objects": [{"label": "road", "polygon": [[1.0, 612.0], [0.0, 953.0], [407.1, 965.1], [711.0, 963.4], [1094.2, 970.3], [1147.7, 963.4], [1185.9, 961.7], [1279.1, 969.9], [1279.0, 696.0], [918.7, 584.6], [881.0, 573.1], [837.4, 561.6], [821.4, 564.1], [795.0, 565.4], [769.2, 565.2], [769.8, 589.9], [763.2, 600.3], [716.7, 603.5], [706.3, 601.4], [703.5, 578.0], [709.2, 566.3], [702.5, 565.2], [697.8, 573.7], [682.6, 571.6], [671.2, 574.8], [666.5, 579.1], [660.8, 582.2], [632.4, 582.2], [624.8, 580.1], [619.5, 569.3], [422.2, 582.2], [427.8, 613.5], [426.5, 646.0], [418.9, 654.5], [367.2, 664.5], [355.9, 667.3], [258.7, 665.9], [247.3, 664.5], [233.4, 640.4], [227.0, 598.3]]}]} 4、the follow is part of Cityscapes' label script. labels = [
Label( 'unlabeled' , 0 , 255 , 'void' , 0 , False , True , ( 0, 0,0) ),
Label( 'road' , 1 , 1 , 'flat' , 1 , False , False , (128, 64,128) ),
]
the picture is the result of prediction,the colour is the colour of road,but no ground color.
@aquariusjay I got black images when using the default loss_weight. By setting the loss_weight my problem is solved since my data are composed of imbalance datas.
@aquariusjay Hello,When I train my dataset which has only one class(the label is "road") and set the background to unlabeled,but get the same loss 0.2622. Can you give some advice on how to train the dataset with one class? I think this is important for some other persons.Thank you. the following is some details:
@Soulempty You question is not related to this issue (ADE20K). Could you please open a new one so that people who have similar experience could share (e.g., @shanyucha)? As I do not have access to your dataset, and it usually takes experimental experience to tune the hyper-parameters.
Thank you,I think I solve the problem how to train dataset with one class with your first advice's inspiration.
@brett-whitford To solve this problem you could inspect the maximum pixel value in the pre-processed gray scale images (after being processed by remove_gt_colormap.py). Your num_classes should be greater than the max pixel value in the images.
I retrained deeplab with Ade20K dataset in my Google Colab notebook, below results with MobileNet-v2 and Xception_65 as initial checkpoint, anyway I couldn't fine tune because of OOM error. May be others can share parameters for training to get better results?
MobileNet-v2
Xception_65
@Soulempty Could you please share more your details about how to train custom dataset with only one class ? I really appreciate it. Thanks!
just as the details I show above,but set the trainId of unlabelled to 1.
@Soulempty Thanks. I still feel confused since I have no idea what the label variable is and where can I find it.
the ground truth label
Where can I find it? Thanks a lot.
On Thu, May 17, 2018 at 10:36 AM, Chao Jiao notifications@github.com wrote:
the ground truth label
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/models/issues/3730#issuecomment-389725708, or mute the thread https://github.com/notifications/unsubscribe-auth/AGoHF90KvohW64gxjBN-pul3lKZC7wVBks5tzOIsgaJpZM4S5uRw .
@lydialixia could you please share more detailed tutorial about how to train custom dataset with two classes?
@Soulempty I am sorry that I still cannot figure out how to train custom dataset with two classes. Could you give a tutorial about how to do it ? Thanks very much!
my dataset have the same style with cityscapes.what is your data like?
@Soulempty Thanks for your reply. My dataset is from Kaggle, https://www.kaggle.com/c/ultrasound-nerve-segmentation/data.
This dataset totally contains 5635 image. (I split this dataset to trianing set with 4000 images and validation set 1635 images)
Origin Image and its corresponding mask are shown below:
I have changed images in training set to with extension .jpg and images in validation set to .png. Then I save them as the style of VOC2012 which is show
Then, I follow the tutorial of @brettkoonce, but it seems there are something wrong with the training procedure.
@RomRoc I am retraining on ADE20K too. May be the link to download dataset has changed (http://groups.csail.mit.edu/vision/datasets/ADE20K/), right? Could you share for me some thing you change in code to retrain ADE20K Thanks
@urgonguyen check here my jupyter notebook that runs in Google Colab. To download ADE20k and convert it you should use download_and_convert_ade20k.sh script.
System information
Describe the problem
This is a feature request. I am trying to train the deeplab model with the ADE20K dataset (see this presentation). I have finished the data format conversion and "successfully" train the model on a small subset of ADE20K. Below is the modification to file
research/deeplab/datasets/segmentation_dataset.py
which is used to extract segmentation data.The problem is, in the ADE20K dataset there are 150 classes, which is different from that in the VOC or cityspace dataset. That brings problem w.r.t the checkpoint file. Currently there are only pretrained model on the VOC and cityspace dataset. So we have two choices here:
Do not use the checkpoint file. In this case, there is an error:
set num_classes=21 to use those two provided checkpoint files
Are there any alternatives to these?
If anyone have any workable solution for the ADE20K dataset it would be really appreciated.