UnicodeDecodeError in popped up while running Tensorflow to create a model

hinano21 commented 1 year ago

Reference Video： https://www.youtube.com/watch?v=8ktcGQ-XreQ error message UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 94: invalid start byte source code python model_main_tf2.py --pipeline_config_path==ssd_efficientdet_d0_512x512_coco17_tpu-8.config --model_dir==training --alsologtostderr

What I tried ⑴ I found an article that suggested reversing the input_path and label_map_path paths in ssd_efficientdet_d0_512x512_coco17_tpu-8.config, so I tried that.

before: `train_input_reader: { label_map_path: "label_map.pbtxt" tf_record_input_reader { input_path: "train.record" } }

eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1; }

eval_input_reader: { label_map_path: "label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "test.record" } }`

after: `train_input_reader: { input_path: "train.record" tf_record_input_reader { label_map_path: "label_map.pbtxt" } }

eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1; }

eval_input_reader: { input_path: "test.record" shuffle: false num_epochs: 1 tf_record_input_reader { label_map_path: "label_map.pbtxt" } }`

(2) thought the error occurred because the character encoding of the CSV file was different from the character encoding (utf-8) that I tried to read in the program, so I looked for the corresponding code.

I found the following in movielens.py, so I thought the character code was OK.

laxmareddyp commented 1 year ago

Hi @hinano21,

We are unable to understand the reference video you have attached here.But you can find the following gist which has everything to training of any model you want to train and make sure that tfrecords should create properly.Please check the following tutorial how to createtfrecords

hinano21 commented 1 year ago

Just to make sure, is the test.record file supposed to be like this ? I am not really sure maybe it is because of my desktop language settings, My default language and pycharm language is Japanese

laxmareddyp commented 1 year ago

Hi @hinano21,

Yes, the TFRecord file is storing a sequence of binary records and for more details , could you please check how to read the TFRecord files.

Thanks

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

hinano21 commented 1 year ago

I'm sorry. I have another project and have not gotten around to it. When I made the TFrecord, the sample looks like this picture However, this is what it looks like in my case. Does this mean that something is not working?

laxmareddyp commented 1 year ago

Hi @hinano21,

From the above screen shot ,unable to understand the exact problem you are facing, but can you please provide a code snippet/colab to reproduce the issue, so that we can have faster resolution.

Thanks!

hinano21 commented 1 year ago

Here is the file I modified

generate_tfrecord.py

def class_text_to_int(row_label): if row_label == 'nine': return 1 elif row_label == 'ten': return 2 elif row_label == 'jack': return 3 elif row_label == 'queen': return 4 elif row_label == 'king': return 5 elif row_label == 'ace': return 6 else: None

label_map.pbtxt item { id: 1 name: 'nine' } item { id: 2 name: 'ten' } item { id: 3 name: 'jack' } item { id: 4 name: 'queen' } item { id: 5 name: 'king' } item { id: 6 name: 'ace' }

ssd_efficientdet_d0_512x512_coco17_tpu-8.config ` model { ssd { inplace_batchnorm_update: true freeze_batchnorm: false num_classes: 6 add_background_class: false box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } encode_background_as_zeros: true anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: [1.0, 2.0, 0.5] scales_per_octave: 3 } } image_resizer { keep_aspect_ratio_resizer { min_dimension: 512 max_dimension: 512 pad_to_max_dimension: true } } box_predictor { weight_shared_convolutional_box_predictor { depth: 64 class_prediction_bias_init: -4.6 conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { random_normal_initializer { stddev: 0.01 mean: 0.0 } } batch_norm { scale: true decay: 0.99 epsilon: 0.001 } } num_layers_before_predictor: 3 kernel_size: 3 use_depthwise: true } } feature_extractor { type: 'ssd_efficientnet-b0_bifpn_keras' bifpn { min_level: 3 max_level: 7 num_iterations: 3 num_filters: 64 } conv_hyperparams { force_use_bias: true

    activation: SWISH
    regularizer {
      l2_regularizer {
        weight: 0.00004
      }
    }
    initializer {
      truncated_normal_initializer {
        stddev: 0.03
        mean: 0.0
      }
    }
    batch_norm {
      scale: true,
      decay: 0.99,
      epsilon: 0.001,
    }
  }
}
loss {
  classification_loss {
    weighted_sigmoid_focal {
      alpha: 0.25
      gamma: 1.5
    }
  }
  localization_loss {
    weighted_smooth_l1 {
    }
  }
  classification_weight: 1.0
  localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
  batch_non_max_suppression {
    score_threshold: 1e-8
    iou_threshold: 0.5
    max_detections_per_class: 100
    max_total_detections: 100
  }
  score_converter: SIGMOID
}

} }

train_config: { fine_tune_checkpoint: "models/research/object_detection/efficientdet_d0_coco17_tpu-32/checkpoint/ckpt-0" fine_tune_checkpoint_version: V2 fine_tune_checkpoint_type: "detection" batch_size: 2 sync_replicas: true startup_delay_steps: 0 replicas_to_aggregate: 8 use_bfloat16: true num_steps: 300000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_scale_crop_and_pad_to_square { output_size: 512 scale_min: 0.1 scale_max: 2.0 } } optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: 8e-2 total_steps: 300000 warmup_learning_rate: .001 warmup_steps: 2500 } } momentum_optimizer_value: 0.9 } use_moving_average: false } max_number_of_boxes: 100 unpad_groundtruth_tensors: false }

train_input_reader: { input_path: "train.record" shuffle: false num_epochs: 1 tf_record_input_reader { label_map_path: "label_map.pbtxt" } }

eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1; }

eval_input_reader: { input_path: "test.record" shuffle: false num_epochs: 1 tf_record_input_reader { label_map_path: "label_map.pbtxt" } } `

hinano21 commented 1 year ago

i created a git repository with my own code can u please take a look at it: https://github.com/hinano21/models

google-ml-butler[bot] commented 1 year ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

hinano21 commented 1 year ago

I gave up on this problem and used the following URL to enable object detection. https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/ Thanks!

tensorflow / models

UnicodeDecodeError in popped up while running Tensorflow to create a model #10935