Closed sxj731533730 closed 2 years ago
@sxj731533730 In order to expedite the trouble-shooting process, could you please provide the entire URL of the repository which you are using. Please provide more details on the issue reported here. Please make sure to use latest TF version as older versions are not actively supported. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
I have the same problem. How can I solve it
when i am training deeplabv3,i enconuter some error,i use the dataset from coco only include person
i paste the prepared training stage and the started training process
ubuntu@ubuntu:~$ conda create -n tf1.15 python=3.6 ubuntu@ubuntu:~$ conda activate tf1.15 (tf1.15) ubuntu@ubuntu:~$ git clone https://github.com/tensorflow/models.git
(tf1.15) ubuntu@ubuntu:~$ pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==1.15.0 tensorflow==1.15.0 (tf1.15) ubuntu@ubuntu:~$ python3 Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
(tf1.15) ubuntu@ubuntu:~$ git clone https://github.com/wkentaro/labelme.git
ubuntu@ubuntu:~/Downloads/dataset$ tree -L 1 . ├── train ├── trainval └── val
3 directories, 0 files 图片的宽度是640 高度480
(tf1.15) ubuntu@ubuntu:~/labelme/examples/semantic_segmentation$ python3 labelme2voc.py /home/ubuntu/Downloads/dataset/train /home/ubuntu/Downloads/dataset/train_voc --labels labels.txt
(tf1.15) ubuntu@ubuntu:~/labelme/examples/semantic_segmentation$ python3 labelme2voc.py /home/ubuntu/Downloads/dataset/trainval /home/ubuntu/Downloads/dataset/trainval_voc --labels labels.txt (tf1.15) ubuntu@ubuntu:~/labelme/examples/semantic_segmentation$ python3 labelme2voc.py /home/ubuntu/Downloads/dataset/val /home/ubuntu/Downloads/dataset/val_voc --labels labels.txt
lable.txt ignore background person
(tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ python3 remove_gt_colormap.py --original_gt_folder=/home/ubuntu/Downloads/dataset/train_voc/SegmentationClassPNG --output_dir=/home/ubuntu/Downloads/dataset/train_voc/SegmentationClassRaw (tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ python3 remove_gt_colormap.py --original_gt_folder=/home/ubuntu/Downloads/dataset/val_voc/SegmentationClassPNG --output_dir=/home/ubuntu/Downloads/dataset/val_voc/SegmentationClassRaw (tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ python3 remove_gt_colormap.py --original_gt_folder=/home/ubuntu/Downloads/dataset/trainval_voc/SegmentationClassPNG --output_dir=/home/ubuntu/Downloads/dataset/trainval_voc/SegmentationClassRaw
find . -name ".jpg" > ../trainlist/train.txt find . -name ".jpg" > ../vallist/val.txt find . -name "*.jpg" > ../trainvallist/trainval.txt 使用文本替换的功能修正为txt只有文件名字列表,没有后缀名和文件夹路径
(tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ python3 build_voc2012_data.py --image_folder=/home/ubuntu/Downloads/dataset/train_voc/JPEGImages --semantic_segmentation_folder=/home/ubuntu/Downloads/dataset/train_voc/SegmentationClassRaw --list_folder=/home/ubuntu/Downloads/dataset/train_voc/trainlist --image_format="jpg" --output_dir=/home/ubuntu/models/research/deeplab/datasets/datasetData
(tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ python3 build_voc2012_data.py --image_folder=/home/ubuntu/Downloads/dataset/trainval_voc/JPEGImages --semantic_segmentation_folder=/home/ubuntu/Downloads/dataset/trainval_voc/SegmentationClassRaw --list_folder=/home/ubuntu/Downloads/dataset/trainval_voc/trainvallist --image_format="jpg" --output_dir=/home/ubuntu/models/research/deeplab/datasets/datasetData
(tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ python3 build_voc2012_data.py --image_folder=/home/ubuntu/Downloads/dataset/val_voc/JPEGImages --semantic_segmentation_folder=/home/ubuntu/Downloads/dataset/val_voc/SegmentationClassRaw --list_folder=/home/ubuntu/Downloads/dataset/val_voc/vallist --image_format="jpg" --output_dir=/home/ubuntu/models/research/deeplab/datasets/datasetData
/home/ubuntu/models/research/deeplab/datasets/data_generator.py
_MYDATA_INFORMATION = DatasetDescriptor( splits_to_sizes={ 'train': 869, # 训练集数量 'trainval': 532, # 训练集数量 'val': 140, # 测试集数量 }, num_classes=3,#ignore+background+Arrow =3 ignore_label=255, )
112行
_DATASETS_INFORMATION = { 'cityscapes': _CITYSCAPES_INFORMATION, 'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION, 'ade20k': _ADE20K_INFORMATION, 'mydata':_MYDATA_INFORMATION, # 添加自己的数据集 }
/home/ubuntu/models/research/deeplab/utils/train_utils.py
Variables that will not be restored.
exclude_list = ['global_step']
exclude_list = ['global_step','logits'] if not initialize_last_layer: exclude_list.extend(last_layers)
/home/ubuntu/models/research/deeplab/train.py
flags.DEFINE_boolean('initialize_last_layer', False, 'Initialize the last layer.')
flags.DEFINE_boolean('last_layers_contain_logits_only', True, 'Only consider logits as last layers or not.')
/home/ubuntu/models/research/deeplab/utils/get_dataset_colormap.py _DATASET_NAME='mydata' # 添加在这里,和注册的名字相同
def create_dataset_name_label_colormap(): return np.asarray([ [165, 42, 42], [0, 192, 0], [196, 196, 196], ])
elif dataset == _DATASET_NAME: # 添加在这里 return create_dataset_name_label_colormap()
(tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ wget -nd -c http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz (tf1.15) ubuntu@ubuntu:~/models/research/deeplab/datasets$ tar -zxvf deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
(tf1.15) ubuntu@ubuntu:~/models/research$ CUDA_VISIBLE_DEVICES=0 python3 deeplab/train.py --logtostderr --num_clones=2 --training_number_of_steps=3000 --train_split="train" --model_variant="mobilenet_v2" --output_stride=8 --fine_tune_batch_norm=true --label_weights={0,0.1,10} --train_batch_size=2 --train_crop_size="481,641" --dataset="mydata" --tf_initial_checkpoint='/home/ubuntu/models/research/deeplab/datasets/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000' --train_logdir='/home/ubuntu/models/research/deeplab/datasets/result' --dataset_dir='/home/ubuntu/models/research/deeplab/datasets/datasetData'
but i encouter some error
INFO:tensorflow:Recording summary at step 0. I0910 21:44:20.737598 140448232363776 supervisor.py:1050] Recording summary at step 0. INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, 2 root error(s) found. (0) Invalid argument: Loss is inf or nan. : Tensor had NaN values [[node CheckNumerics (defined at /home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[concat_projection/BatchNorm/gamma/sum_grads/_1241]] (1) Invalid argument: Loss is inf or nan. : Tensor had NaN values [[node CheckNumerics (defined at /home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.
Original stack trace for 'CheckNumerics': File "deeplab/train.py", line 464, in
tf.app.run()
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "deeplab/train.py", line 398, in main
total_loss = tf.check_numerics(total_loss, 'Loss is inf or nan.')
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1011, in check_numerics
"CheckNumerics", tensor=tensor, message=message, name=name)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/ubuntu/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()