Error while training WSOD model

yekeren / Cap2Det

Implementation of our ICCV 2019 paper "Cap2Det: Learning to AmplifyWeak Caption Supervision for Object Detection"

Apache License 2.0

29 stars 9 forks source link

Error while training WSOD model #18

Closed SubratoChakravorty closed 4 years ago

SubratoChakravorty commented 4 years ago

While training a WSOD model using train.sh, I get following error. Traceback (most recent call last): File "train/trainer_main.py", line 56, in tf.app.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "train/trainer_main.py", line 50, in main trainer.create_train_and_evaluate(pipeline_proto) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 235, in create_train_and_evaluate tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 45, in _model_fn predictions = model.build_prediction(features) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 232, in build_prediction predictions = self._build_prediction(examples) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 174, in _build_prediction inputs, num_proposals, proposals, options.frcnn_options, is_training) File "/subrato-volume/Cap2Det/models/utils.py", line 183, in extract_frcnn_feature assignment_map={"/": "first_stage_feature_extraction/"}) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call return merge_fn(self._strategy, args, **kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 297, in _init_from_checkpoint reader = load_checkpoint(ckpt_dir_or_file) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 66, in load_checkpoint return pywrap_tensorflow.NewCheckpointReader(filename) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 873, in NewCheckpointReader return CheckpointReader(compat.as_bytes(filepattern)) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 885, in init this = _pywrap_tensorflow_internal.new_CheckpointReader(filename) tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on zoo/inception_v2_2016_08_28/inception_v2.ckpt: Not found: zoo/inception_v2_2016_08_28; No such file or directory

Please help.

yekeren commented 4 years ago

Please download it from https://github.com/tensorflow/models/tree/master/research/slim.

SubratoChakravorty commented 4 years ago

Hi downloaded the model and put in the zoo folder. It gives me another error regarding unavailability of a tensor. Below is the stack trace.

Traceback (most recent call last): File "train/trainer_main.py", line 56, in tf.app.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "train/trainer_main.py", line 50, in main trainer.create_train_and_evaluate(pipeline_proto) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 235, in create_train_and_evaluate tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 45, in _model_fn predictions = model.build_prediction(features) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 232, in build_prediction predictions = self._build_prediction(examples) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 174, in _build_prediction inputs, num_proposals, proposals, options.frcnn_options, is_training) File "/subrato-volume/Cap2Det/models/utils.py", line 183, in extract_frcnn_feature assignment_map={"/": "first_stage_feature_extraction/"}) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call return merge_fn(self._strategy, args, **kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 371, in _init_from_checkpoint tensor_name_in_ckpt, ckpt_dir_or_file ValueError: Tensor InceptionV2/Conv2d_1a_7x7/BatchNorm/beta (InceptionV2/Conv2d_1a_7x7/BatchNorm/beta in /) is not found in zoo/inception_v2_2016_08_28/inception_v2.ckpt checkpoint

yekeren commented 4 years ago

A quick fix for the pre-trained checkpoint is to modify the L116 of "tensorflow_models/research/slim/nets/inception_v2.py". The fix disabled batch-norm, it should accord to the pre-trained model ``inception_v2_2016_08_28''.

102 if use_separable_conv:

103 # depthwise_multiplier here is different from depth_multiplier.

104 # depthwise_multiplier determines the output channels of the initial

105 # depthwise conv (see docs for tf.nn.separable_conv2d), while

106 # depth_multiplier controls the # channels of the subsequent 1x1

107 # convolution. Must have

108 # in_channels * depthwise_multipler <= out_channels

109 # so that the separable convolution is not overparameterized.

110 depthwise_multiplier = min(int(depth(64) / 3), 8)

111 net = slim.separable_conv2d(

112 inputs, depth(64), [7, 7],

113 depth_multiplier=depthwise_multiplier,

114 stride=2,

115 padding='SAME',

116 normalizer_fn=None, # A quick fix.

117 weights_initializer=trunc_normal(1.0),

118 scope=end_point)

On Mon, Feb 17, 2020 at 2:14 PM SubratoChakravorty notifications@github.com wrote:

Hi downloaded the model and put in the zoo folder. It gives me another error regarding unavailability of a tensor. Below is the stack trace.

Traceback (most recent call last): File "train/trainer_main.py", line 56, in tf.app.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "train/trainer_main.py", line 50, in main trainer.create_train_and_evaluate(pipeline_proto) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 235, in create_train_and_evaluate tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 45, in _model_fn predictions = model.build_prediction(features) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 232, in build_prediction predictions = self._build_prediction(examples) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 174, in _build_prediction inputs, num_proposals, proposals, options.frcnn_options, is_training) File "/subrato-volume/Cap2Det/models/utils.py", line 183, in extract_frcnn_feature assignment_map={"/": "first_stage_feature_extraction/"}) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call return merge_fn(self._strategy, args, **kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 371, in _init_from_checkpoint tensor_name_in_ckpt, ckpt_dir_or_file ValueError: Tensor InceptionV2/Conv2d_1a_7x7/BatchNorm/beta (InceptionV2/Conv2d_1a_7x7/BatchNorm/beta in /) is not found in zoo/inception_v2_2016_08_28/inception_v2.ckpt checkpoint

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/yekeren/Cap2Det/issues/18?email_source=notifications&email_token=AA6CPAI7RK3Q7O55H2XNYA3RDLOXXA5CNFSM4KWW3ACKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL7NKTY#issuecomment-587126095, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAIJTIHNDJ3GTGUUDW3RDLOXXANCNFSM4KWW3ACA .

-- Thanks, best regards.

Keren

SubratoChakravorty commented 4 years ago

I made the above changes i.e., added normalizer_fn = None to the function parameters. But I still get the same error. The default value of ''' normalizer_fn''' is already None. So, I don't understand making the above change does anything.

yekeren commented 4 years ago

The BatchNorm layer in slim can also be determined by arg_scope. You could debug the inception_v2 file to ensure that the batch_norm is disabled for the layer ``Conv2d_1a_7x7''. I found the fix works in my environment.

On Mon, Feb 17, 2020 at 4:19 PM SubratoChakravorty notifications@github.com wrote:

I made the above changes i.e., added normalizer_fn = None to the function parameters. But I still get the same error. The default value of ''' normalizer_fn''' is already None. So, I don't understand making the above change does anything.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/yekeren/Cap2Det/issues/18?email_source=notifications&email_token=AA6CPAJ34LLOD54D6AZ4CQTRDL5PDA5CNFSM4KWW3ACKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL7WHRI#issuecomment-587162565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAKZUQYPT4XUX4CWGITRDL5PDANCNFSM4KWW3ACA .

-- Thanks, best regards.

Keren

yekeren commented 4 years ago

I made the above changes i.e., added normalizer_fn = None to the function parameters. But I still get the same error. The default value of ''' normalizer_fn''' is already None. So, I don't understand making the above change does anything.

Are you sure you call the separable_conv2d with ``normalizer_fn=None''? Did you check the inception model goes into the same branch?

SubratoChakravorty commented 4 years ago

Yeah, I looked and I think I found the issue. It's going through inception_utils.py where normalizer_fn is not None. I changed it and it worked for one conv layer and failed in the next one. I think I can fix it, I will let you know. Thanks for the help.

SubratoChakravorty commented 4 years ago

Hi Keren,

I didn't notice it earlier. But adding normalizer_fn=None to separable_conv layer, solves the issue for that layer but then I get the same error for conv2d_2b_1*1. Do you have a fix for this?

Traceback (most recent call last): File "train/trainer_main.py", line 56, in tf.app.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "train/trainer_main.py", line 50, in main trainer.create_train_and_evaluate(pipeline_proto) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 235, in create_train_and_evaluate tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 45, in _model_fn predictions = model.build_prediction(features) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 232, in build_prediction predictions = self._build_prediction(examples) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 174, in _build_prediction inputs, num_proposals, proposals, options.frcnn_options, is_training) File "/subrato-volume/Cap2Det/models/utils.py", line 183, in extract_frcnn_feature assignment_map={"/": "first_stage_feature_extraction/"}) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call return merge_fn(self._strategy, args, **kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 371, in _init_from_checkpoint tensor_name_in_ckpt, ckpt_dir_or_file ValueError: Tensor InceptionV2/Conv2d_2b_1x1/BatchNorm/gamma (InceptionV2/Conv2d_2b_1x1/BatchNorm/gamma in /) is not found in zoo/inception_v2_2016_08_28/inception_v2.ckpt checkpoint

yekeren commented 4 years ago

Please try batch_norm_scale=False in the following file: https://github.com/tensorflow/models/blob/f3600cd1f755090168ea44d85d47161fafb37bc8/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py#L132 . You could use the ``inspect_checkpoint'' tool to check the variables in the checkpoint file.

On Tue, Feb 18, 2020 at 4:49 PM SubratoChakravorty notifications@github.com wrote:

Hi Keren,

I didn't notice it earlier. But adding normalizer_fn=None to separable_conv layer, solves the issue for that layer but then I get the same layer for conv2d_2b_1*1. Do you have a fix for this?

Traceback (most recent call last): File "train/trainer_main.py", line 56, in tf.app.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/subrato-volume/tf15env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "train/trainer_main.py", line 50, in main trainer.create_train_and_evaluate(pipeline_proto) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 235, in create_train_and_evaluate tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "/subrato-volume/Cap2Det/train/train/trainer.py", line 45, in _model_fn predictions = model.build_prediction(features) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 232, in build_prediction predictions = self._build_prediction(examples) File "/subrato-volume/Cap2Det/models/cap2det_model.py", line 174, in _build_prediction inputs, num_proposals, proposals, options.frcnn_options, is_training) File "/subrato-volume/Cap2Det/models/utils.py", line 183, in extract_frcnn_feature assignment_map={"/": "first_stage_feature_extraction/"}) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call return merge_fn(self._strategy, args, **kwargs) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/subrato-volume/tf15env/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 371, in _init_from_checkpoint tensor_name_in_ckpt, ckpt_dir_or_file ValueError: Tensor InceptionV2/Conv2d_2b_1x1/BatchNorm/gamma (InceptionV2/Conv2d_2b_1x1/BatchNorm/gamma in /) is not found in zoo/inception_v2_2016_08_28/inception_v2.ckpt checkpoint

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/yekeren/Cap2Det/issues/18?email_source=notifications&email_token=AA6CPAP4Q57T75XWMD47QALRDRJXNA5CNFSM4KWW3ACKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMFNAMY#issuecomment-587911219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAOYFIDYFQDB3EXE47TRDRJXNANCNFSM4KWW3ACA .

-- Thanks, best regards.

Keren

SubratoChakravorty commented 4 years ago

Hi Keren, Thanks a lot for the suggestions. I was able to initialize using the checkpoint, I had to update 'batch_norm_scale=False' in two places.

ziyanyang commented 3 years ago

Hi,

Could you summarize what you did to initialize using the checkpoint? I made modifications such as adding normalizer_fn = None but it does not work at all.

Best, Ziyan

yekeren commented 3 years ago

Please refer to " https://github.com/yekeren/Cap2Det/blob/cap2det/install-env.sh". I have forked a tensorflow/models/ branch (see "git checkout cap2det") and made the modifications there. Please let me know if it did not work.

On Mon, Apr 19, 2021 at 5:47 PM Ziyan Yang @.***> wrote:

Hi,

Could you summarize what you did to initialize using the checkpoint? I made modifications such as adding normalizer_fn = None but it does not work at all.

Best, Ziyan

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/yekeren/Cap2Det/issues/18#issuecomment-822807937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAPXN5QL3NQRLPZX4MTTJSQHDANCNFSM4KWW3ACA .

-- Thanks, best regards.

Keren

ziyanyang commented 3 years ago

Please refer to " https://github.com/yekeren/Cap2Det/blob/cap2det/install-env.sh". I have forked a tensorflow/models/ branch (see "git checkout cap2det") and made the modifications there. Please let me know if it did not work. … On Mon, Apr 19, 2021 at 5:47 PM Ziyan Yang @.***> wrote: Hi, Could you summarize what you did to initialize using the checkpoint? I made modifications such as adding normalizer_fn = None but it does not work at all. Best, Ziyan — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAPXN5QL3NQRLPZX4MTTJSQHDANCNFSM4KWW3ACA . -- Thanks, best regards. Keren

I run sh install-env.sh, and I do have a folder Cap2Det/tensorflow_models. I saw you have already added the modification "normalizer_fn = None" in this folder but it does not work for me.

ziyanyang commented 3 years ago

Traceback (most recent call last): File "train/trainer_main.py", line 61, in tf.app.run() File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/absl/app.py", line 303, in run _run_main(main, args) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "train/trainer_main.py", line 55, in main trainer.create_train_and_evaluate(pipeline_proto) File "/net/zf18/zy3cx/Cap2Det/train/trainer.py", line 239, in create_train_and_evaluate tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 640, in run getattr(self, task_to_run)() File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 645, in run_chief return self._start_distributed_training() File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 796, in _start_distributed_training saving_listeners=saving_listeners) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "/net/zf18/zy3cx/Cap2Det/train/trainer.py", line 49, in _model_fn predictions = model.build_prediction(features) File "/net/zf18/zy3cx/Cap2Det/models/cap2det_model.py", line 232, in build_prediction predictions = self._build_prediction(examples) File "/net/zf18/zy3cx/Cap2Det/models/cap2det_model.py", line 174, in _build_prediction inputs, num_proposals, proposals, options.frcnn_options, is_training) File "/net/zf18/zy3cx/Cap2Det/models/utils.py", line 184, in extract_frcnn_feature assignment_map={"/": "first_stage_feature_extraction/"}) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call return merge_fn(self._strategy, args, **kwargs) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/zf18/zy3cx/ENTER/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 371, in _init_from_checkpoint tensor_name_in_ckpt, ckpt_dir_or_file ValueError: Tensor InceptionV2/Conv2d_1a_7x7/BatchNorm/beta (InceptionV2/Conv2d_1a_7x7/BatchNorm/beta in /) is not found in zoo/inception_v2_2016_08_28/inception_v2.ckpt checkpoint

This is the error message I got.

yekeren commented 3 years ago

I would suggest you either:

follow the solution in this issue
or, re-run the install-env.sh shell script

The error message shall not prompt again.

ziyanyang commented 3 years ago

I would suggest you either:

follow the solution in this issue

or, re-run the install-env.sh shell script

The error message shall not prompt again.

I re-run the shell script but nothing happened. I tried to follow the solution in this issue but feel confused about what I should do. In my understanding, I need to: (1) add "normalizer_fn=None" in lib/python3.6/site-packages/tensorflow/models/research/slim/nets/inception_v2.py (2) modify batch_norm_scale = False for all the places in lib/python3.6/site-packages/tensorflow/models/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py

Is it correct?

Best, Ziyan

yekeren commented 3 years ago

I would suggest you either:

follow the solution in this issue

or, re-run the install-env.sh shell script

The error message shall not prompt again.

I re-run the shell script but nothing happened. I tried to follow the solution in this issue but feel confused about what I should do. In my understanding, I need to: (1) add "normalizer_fn=None" in lib/python3.6/site-packages/tensorflow/models/research/slim/nets/inception_v2.py (2) modify batch_norm_scale = False for all the places in lib/python3.6/site-packages/tensorflow/models/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py

Is it correct?

Best, Ziyan

I think you are using your local version of the tensorflow models installed at "lib/python3.6/site-packages/tensorflow/models". So your understanding is correct. But, if you follow the steps ("cd cap2det && sh install-env.sh", please read it), the downloaded "cap2det/tensorflow_models" repo should have already resolved the issue.

yekeren commented 3 years ago

I would suggest you either:

follow the solution in this issue

or, re-run the install-env.sh shell script

The error message shall not prompt again.

I re-run the shell script but nothing happened. I tried to follow the solution in this issue but feel confused about what I should do. In my understanding, I need to: (1) add "normalizer_fn=None" in lib/python3.6/site-packages/tensorflow/models/research/slim/nets/inception_v2.py (2) modify batch_norm_scale = False for all the places in lib/python3.6/site-packages/tensorflow/models/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py

Is it correct?

Best, Ziyan

Please refer to the training script https://github.com/yekeren/Cap2Det/blob/727b3025f666e2053b3bbf94cf18f9ab56fb1599/train_cap2det.sh#L21. Your code should not use the locally installed "lib/python3.6/site-packages/tensorflow/models/research" but the one downloaded at "cap2det/tensorflow_models".

ziyanyang commented 3 years ago

I would suggest you either:

follow the solution in this issue

or, re-run the install-env.sh shell script

The error message shall not prompt again.

I re-run the shell script but nothing happened. I tried to follow the solution in this issue but feel confused about what I should do. In my understanding, I need to: (1) add "normalizer_fn=None" in lib/python3.6/site-packages/tensorflow/models/research/slim/nets/inception_v2.py (2) modify batch_norm_scale = False for all the places in lib/python3.6/site-packages/tensorflow/models/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py Is it correct? Best, Ziyan

Please refer to the training script

https://github.com/yekeren/Cap2Det/blob/727b3025f666e2053b3bbf94cf18f9ab56fb1599/train_cap2det.sh#L21

. Your code should not use the locally installed "lib/python3.6/site-packages/tensorflow/models/research" but the one downloaded at "cap2det/tensorflow_models".

Then I got this: train_cap2det.sh: line 21: PYTHONPATH: unbound variable

yekeren commented 3 years ago

I would suggest you either:

follow the solution in this issue

or, re-run the install-env.sh shell script

The error message shall not prompt again.

I re-run the shell script but nothing happened. I tried to follow the solution in this issue but feel confused about what I should do. In my understanding, I need to: (1) add "normalizer_fn=None" in lib/python3.6/site-packages/tensorflow/models/research/slim/nets/inception_v2.py (2) modify batch_norm_scale = False for all the places in lib/python3.6/site-packages/tensorflow/models/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py Is it correct? Best, Ziyan

Please refer to the training script https://github.com/yekeren/Cap2Det/blob/727b3025f666e2053b3bbf94cf18f9ab56fb1599/train_cap2det.sh#L21

. Your code should not use the locally installed "lib/python3.6/site-packages/tensorflow/models/research" but the one downloaded at "cap2det/tensorflow_models".

Then I got this: train_cap2det.sh: line 21: PYTHONPATH: unbound variable

You could modify the file according to your environment, e.g., initialize it to an empty string, or export PYTHONPATH="pwd" first.

ziyanyang commented 3 years ago

I would suggest you either:

follow the solution in this issue

or, re-run the install-env.sh shell script

The error message shall not prompt again.

I re-run the shell script but nothing happened. I tried to follow the solution in this issue but feel confused about what I should do. In my understanding, I need to: (1) add "normalizer_fn=None" in lib/python3.6/site-packages/tensorflow/models/research/slim/nets/inception_v2.py (2) modify batch_norm_scale = False for all the places in lib/python3.6/site-packages/tensorflow/models/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py Is it correct? Best, Ziyan

Please refer to the training script https://github.com/yekeren/Cap2Det/blob/727b3025f666e2053b3bbf94cf18f9ab56fb1599/train_cap2det.sh#L21

. Your code should not use the locally installed "lib/python3.6/site-packages/tensorflow/models/research" but the one downloaded at "cap2det/tensorflow_models".

Then I got this: train_cap2det.sh: line 21: PYTHONPATH: unbound variable

You could modify the file according to your environment, e.g., initialize it to an empty string, or export PYTHONPATH="pwd" first.

I export paths before running train_cap2det.sh and it seems to work. Thank you for your help!