tensorflow / models

Models and examples built with TensorFlow
Other
77.24k stars 45.75k forks source link

Training on AI platform is not using GPUs #9832

Open sniper0110 opened 3 years ago

sniper0110 commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

I am running the training of some models (SSD for object detection and Mask RCNN for segmentation) on AI platform. The training works fine but it is not using the GPUs on AI platform even though I am choosing a set of GPUs when I run my training. It's not a problem of quotas because I checked the quota of GPUs available for me to use and I am using just that.

3. Steps to reproduce

Prepare dataset and config file and then run the training job using this command :

gcloud ai-platform jobs submit training segmentation_maskrcnn_`date +%m_%d_%Y_%H_%M_%S` \
    --runtime-version 2.1 \
    --python-version 3.7 \
    --job-dir=gs://${MODEL_DIR} \
    --package-path ./object_detection \
    --module-name object_detection.model_main_tf2 \
    --region us-central1 \
    --scale-tier CUSTOM \
    --master-machine-type n1-highcpu-16 \
    --master-accelerator count=2,type=nvidia-tesla-v100 \
    -- \
    --model_dir=gs://${MODEL_DIR} \
    --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}

The job starts and finishes just fine but it's not using the GPUs. When I look at the GPU usage for my training job, I see this: image

As you can see, the usage is at 0% all the time.

4. Expected behavior

I expected the training to run on the GPU.

5. Additional context

The full logs from my training job :

textPayload,insertId,resource.type,resource.labels.task_name,resource.labels.project_id,resource.labels.job_id,timestamp,severity,"labels.""ml.googleapis.com/endpoint""","labels.""ml.googleapis.com/trial_id""","labels.""compute.googleapis.com/zone""","labels.""ml.googleapis.com/job_id/log_area""","labels.""compute.googleapis.com/resource_id""","labels.""compute.googleapis.com/resource_name""",logName,receiveTimestamp,jsonPayload.lineno,jsonPayload.created,jsonPayload.pathname,jsonPayload.message,jsonPayload.levelname
Job completed successfully.,nbbjj8c6m2,ml_job,service,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:42:33.650897386Z,INFO,,,,,,,projects/object-detection-using-tf2/logs/ml.googleapis.com%2Fsegmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:42:34.748884813Z,,,,,
,45tzlog1c7fase,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:39:34.455579042Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T04:39:37.089133260Z,1012,1616387974.455579,/runcloudml.py,Task completed successfully.,INFO
,45tzlog1c7fasd,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:39:34.454813003Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T04:39:37.089133260Z,1010,1616387974.454813,/runcloudml.py,Clean up finished.,INFO
,45tzlog1c7fasc,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:39:34.454482793Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T04:39:37.089133260Z,1007,1616387974.4544828,/runcloudml.py,Module completed; cleaning up.,INFO
,5txefng1qkl69a,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:38:41.415646553Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T04:38:43.966917168Z,328,1616387921.4156466,/runcloudml.py,I0322 04:38:41.411229 140231166109440 model_lib_v2.py:652] Step 1000 per-step time 26.693s loss=0.166,ERROR
,5txefng1qkl699,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T04:38:41.411229847Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T04:38:43.966917168Z,652,1616387921.4112298,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 1000 per-step time 26.693s loss=0.166,INFO
,bnsbrwg1gvxnk5,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T03:55:54.539618967Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T03:55:57.337411333Z,328,1616385354.539619,/runcloudml.py,I0322 03:55:54.531261 140231166109440 model_lib_v2.py:652] Step 900 per-step time 24.444s loss=0.493,ERROR
,bnsbrwg1gvxnk4,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T03:55:54.531261920Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T03:55:57.337411333Z,652,1616385354.531262,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 900 per-step time 24.444s loss=0.493,INFO
,166ibvkg1ezjenm,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T03:17:01.596560478Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T03:17:03.743686727Z,328,1616383021.5965605,/runcloudml.py,I0322 03:17:01.545832 140231166109440 model_lib_v2.py:652] Step 800 per-step time 21.946s loss=0.337,ERROR
,166ibvkg1ezjenl,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T03:17:01.545832633Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T03:17:03.743686727Z,652,1616383021.5458326,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 800 per-step time 21.946s loss=0.337,INFO
,1ohpz8vg1fjwa59,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T02:42:07.387903212Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T02:42:10.164307869Z,328,1616380927.3879032,/runcloudml.py,I0322 02:42:07.385901 140231166109440 model_lib_v2.py:652] Step 700 per-step time 19.679s loss=1.361,ERROR
,1ohpz8vg1fjwa58,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T02:42:07.385901927Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T02:42:10.164307869Z,652,1616380927.385902,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 700 per-step time 19.679s loss=1.361,INFO
,14931mg1mrxmov,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T02:11:19.965640543Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T02:11:21.633761126Z,328,1616379079.9656405,/runcloudml.py,I0322 02:11:19.963637 140231166109440 model_lib_v2.py:652] Step 600 per-step time 17.090s loss=0.533,ERROR
,14931mg1mrxmou,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T02:11:19.963637589Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T02:11:21.633761126Z,652,1616379079.9636376,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 600 per-step time 17.090s loss=0.533,INFO
,10d3pgdg1d8hzeb,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T01:44:56.973894834Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T01:44:58.155623266Z,328,1616377496.9738948,/runcloudml.py,I0322 01:44:56.971558 140231166109440 model_lib_v2.py:652] Step 500 per-step time 14.812s loss=1.388,ERROR
,10d3pgdg1d8hzea,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T01:44:56.971558809Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T01:44:58.155623266Z,652,1616377496.9715588,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 500 per-step time 14.812s loss=1.388,INFO
,1s3vy2bg1l6u9xb,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T01:20:14.449005842Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T01:20:16.720077768Z,328,1616376014.4490058,/runcloudml.py,I0322 01:20:14.446853 140231166109440 model_lib_v2.py:652] Step 400 per-step time 15.014s loss=1.696,ERROR
,1s3vy2bg1l6u9xa,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T01:20:14.446853875Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T01:20:16.720077768Z,652,1616376014.4468539,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 400 per-step time 15.014s loss=1.696,INFO
,bnznx5g1dvserf,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T00:55:20.598791599Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T00:55:23.243402785Z,328,1616374520.5987916,/runcloudml.py,I0322 00:55:20.596668 140231166109440 model_lib_v2.py:652] Step 300 per-step time 14.905s loss=2.389,ERROR
,bnznx5g1dvsere,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T00:55:20.596668719Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T00:55:23.243402785Z,652,1616374520.5966687,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 300 per-step time 14.905s loss=2.389,INFO
,1fc2knhg1die7pk,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T00:30:44.413701056Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T00:30:47.710498085Z,328,1616373044.413701,/runcloudml.py,I0322 00:30:44.411485 140231166109440 model_lib_v2.py:652] Step 200 per-step time 14.713s loss=3.534,ERROR
,1fc2knhg1die7pj,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T00:30:44.411485909Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T00:30:47.710498085Z,652,1616373044.411486,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 200 per-step time 14.713s loss=3.534,INFO
,rhdbo1g187n8k5,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T00:06:11.853744268Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T00:06:14.284628754Z,328,1616371571.8537443,/runcloudml.py,I0322 00:06:11.841253 140231166109440 model_lib_v2.py:652] Step 100 per-step time 14.762s loss=3.772,ERROR
,rhdbo1g187n8k4,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-22T00:06:11.841253994Z,INFO,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-22T00:06:14.284628754Z,652,1616371571.841254,/root/.local/lib/python3.7/site-packages/object_detection/model_lib_v2.py,Step 100 per-step time 14.762s loss=3.772,INFO
,1q5w52tg176dav7,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:41:27.010094165Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:41:29.865113886Z,230,1616370087.0100942,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Shuffle buffer filled.,INFO
,1n3rxvrg1biedfl,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:41:21.342122315Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:41:23.796221561Z,177,1616370081.3421223,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Filling up shuffle buffer (this may take a while): 1871 of 2048,INFO
,jf98w8g18i2jlj,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:41:11.342118500Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:41:13.732086262Z,177,1616370071.3421185,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Filling up shuffle buffer (this may take a while): 1553 of 2048,INFO
,1o7mz20g1kir1ws,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:41:01.260507821Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:41:03.650909634Z,177,1616370061.2605078,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Filling up shuffle buffer (this may take a while): 1251 of 2048,INFO
,bnsfmxg1d17foj,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:40:51.313074110Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:40:53.546610659Z,177,1616370051.313074,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Filling up shuffle buffer (this may take a while): 937 of 2048,INFO
,1qzns8vg1k138uv,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:40:41.270817755Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:40:43.491405134Z,177,1616370041.2708178,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Filling up shuffle buffer (this may take a while): 631 of 2048,INFO
,14ik3wug1da8lpv,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:40:31.268834829Z,INFO,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:40:33.402817680Z,177,1616370031.2688348,tensorflow/core/kernels/data/shuffle_dataset_op.cc,Filling up shuffle buffer (this may take a while): 307 of 2048,INFO
,1hjwrdrg1b6y659,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:50.730029344Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:52.319852782Z,328,1616369990.7300293,/runcloudml.py,Use fn_output_signature instead,ERROR
,1hjwrdrg1b6y658,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:50.729914187Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:52.319852782Z,328,1616369990.7299142,/runcloudml.py,Instructions for updating:,ERROR
,1hjwrdrg1b6y657,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:50.729580640Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:52.319852782Z,328,1616369990.7295806,/runcloudml.py,W0321 23:39:50.728997 140222654293760 deprecation.py:537] From /root/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py:605: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.,ERROR
,1hjwrdrg1b6y656,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:50.728997467Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:52.319852782Z,537,1616369990.7289975,/root/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py,"From /root/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py:605: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead",WARNING
,12kr8b5g18a5qeu,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.393443821Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3934438,/runcloudml.py,"W0321 23:39:16.393274 140231166109440 util.py:169] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.",ERROR
,12kr8b5g18a5qet,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.393274306Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,169,1616369956.3932743,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,"A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.",WARNING
,12kr8b5g18a5qes,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.393269299Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3932693,/runcloudml.py,W0321 23:39:16.393120 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.bias,ERROR
,12kr8b5g18a5qer,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.393127201Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3931272,/runcloudml.py,W0321 23:39:16.392988 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.kernel,ERROR
,12kr8b5g18a5qeq,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.393120288Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3931203,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.bias,WARNING
,12kr8b5g18a5qep,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392990349Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3929904,/runcloudml.py,W0321 23:39:16.392864 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.bias,ERROR
,12kr8b5g18a5qeo,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392988443Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3929884,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.kernel,WARNING
,12kr8b5g18a5qen,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392874956Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.392875,/runcloudml.py,W0321 23:39:16.392718 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.kernel,ERROR
,12kr8b5g18a5qem,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392864941Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.392865,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.bias,WARNING
,12kr8b5g18a5qel,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392726897Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.392727,/runcloudml.py,W0321 23:39:16.392573 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.9.bias,ERROR
,12kr8b5g18a5qek,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392718791Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3927188,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.kernel,WARNING
,12kr8b5g18a5qej,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392579077Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.392579,/runcloudml.py,W0321 23:39:16.392436 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.9.kernel,ERROR
,12kr8b5g18a5qei,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392573594Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3925736,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.9.bias,WARNING
,12kr8b5g18a5qeh,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392443418Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3924434,/runcloudml.py,W0321 23:39:16.392292 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.6.bias,ERROR
,12kr8b5g18a5qeg,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392436981Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.392437,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.9.kernel,WARNING
,12kr8b5g18a5qef,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392303466Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3923035,/runcloudml.py,W0321 23:39:16.392165 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.6.kernel,ERROR
,12kr8b5g18a5qee,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392292021Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.392292,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.6.bias,WARNING
,12kr8b5g18a5qed,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392172097Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.392172,/runcloudml.py,W0321 23:39:16.392038 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.3.bias,ERROR
,12kr8b5g18a5qec,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392165422Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3921654,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.6.kernel,WARNING
,12kr8b5g18a5qeb,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392043113Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.392043,/runcloudml.py,W0321 23:39:16.391913 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.3.kernel,ERROR
,12kr8b5g18a5qea,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.392038821Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3920388,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.3.bias,WARNING
,12kr8b5g18a5qe9,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391925096Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.391925,/runcloudml.py,W0321 23:39:16.391762 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.0.bias,ERROR
,12kr8b5g18a5qe8,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391913652Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3919137,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.3.kernel,WARNING
,12kr8b5g18a5qe7,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391782999Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.391783,/runcloudml.py,W0321 23:39:16.391636 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.0.kernel,ERROR
,12kr8b5g18a5qe6,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391762255Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3917623,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.0.bias,WARNING
,12kr8b5g18a5qe5,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391647576Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3916476,/runcloudml.py,W0321 23:39:16.391492 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0,ERROR
,12kr8b5g18a5qe4,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391636370Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3916364,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.0.kernel,WARNING
,12kr8b5g18a5qe3,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391503571Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3915036,/runcloudml.py,W0321 23:39:16.391344 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0,ERROR
,12kr8b5g18a5qe2,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391492842Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3914928,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0,WARNING
,12kr8b5g18a5qe1,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391352891Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.391353,/runcloudml.py,W0321 23:39:16.391208 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.9,ERROR
,12kr8b5g18a5qe0,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391344547Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3913445,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0,WARNING
,12kr8b5g18a5qdz,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391216277Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3912163,/runcloudml.py,W0321 23:39:16.391031 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.8,ERROR
,12kr8b5g18a5qdy,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391208409Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3912084,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.9,WARNING
,12kr8b5g18a5qdx,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391038656Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3910387,/runcloudml.py,W0321 23:39:16.390881 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.7,ERROR
,12kr8b5g18a5qdw,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.391031742Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3910317,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.8,WARNING
,12kr8b5g18a5qdv,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390887498Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3908875,/runcloudml.py,W0321 23:39:16.390685 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.6,ERROR
,12kr8b5g18a5qdu,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390881298Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3908813,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.7,WARNING
,12kr8b5g18a5qdt,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390691517Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3906915,/runcloudml.py,W0321 23:39:16.390511 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.5,ERROR
,12kr8b5g18a5qds,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390685558Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3906856,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.6,WARNING
,12kr8b5g18a5qdr,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390530824Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3905308,/runcloudml.py,W0321 23:39:16.390305 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.4,ERROR
,12kr8b5g18a5qdq,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390511989Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.390512,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.5,WARNING
,12kr8b5g18a5qdo,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390347479Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3903475,/runcloudml.py,W0321 23:39:16.390026 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.3,ERROR
,12kr8b5g18a5qdp,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390305280Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3903053,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.4,WARNING
,12kr8b5g18a5qdn,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390196083Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.390196,/runcloudml.py,W0321 23:39:16.389908 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.2,ERROR
,12kr8b5g18a5qdl,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390076159Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3900762,/runcloudml.py,W0321 23:39:16.389710 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.1,ERROR
,12kr8b5g18a5qdm,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.390026092Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.390026,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.3,WARNING
,12kr8b5g18a5qdk,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389951228Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3899512,/runcloudml.py,W0321 23:39:16.389565 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.0,ERROR
,12kr8b5g18a5qdj,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389908313Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3899083,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.2,WARNING
,12kr8b5g18a5qdh,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389710425Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3897104,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.1,WARNING
,12kr8b5g18a5qdi,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389659642Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3896596,/runcloudml.py,W0321 23:39:16.389420 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._class_prediction_head._class_predictor_layers.1.bias,ERROR
,12kr8b5g18a5qdg,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389565467Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3895655,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers.0,WARNING
,12kr8b5g18a5qdf,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389427661Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3894277,/runcloudml.py,W0321 23:39:16.389240 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._class_prediction_head._class_predictor_layers.1.kernel,ERROR
,12kr8b5g18a5qde,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389420985Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.389421,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._class_prediction_head._class_predictor_layers.1.bias,WARNING
,12kr8b5g18a5qdd,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389244555Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3892446,/runcloudml.py,W0321 23:39:16.389116 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._box_prediction_head._box_encoder_layers.1.bias,ERROR
,12kr8b5g18a5qdc,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389240503Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3892405,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._class_prediction_head._class_predictor_layers.1.kernel,WARNING
,12kr8b5g18a5qdb,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389121054Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.389121,/runcloudml.py,W0321 23:39:16.388967 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._box_prediction_head._box_encoder_layers.1.kernel,ERROR
,12kr8b5g18a5qda,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.389116764Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3891168,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._box_prediction_head._box_encoder_layers.1.bias,WARNING
,12kr8b5g18a5qd9,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388974428Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3889744,/runcloudml.py,W0321 23:39:16.388767 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers,ERROR
,12kr8b5g18a5qd8,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388967989Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.388968,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._box_prediction_head._box_encoder_layers.1.kernel,WARNING
,12kr8b5g18a5qd7,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388773918Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.388774,/runcloudml.py,W0321 23:39:16.388589 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers,ERROR
,12kr8b5g18a5qd6,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388767956Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.388768,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers,WARNING
,12kr8b5g18a5qd5,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388593673Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.3885937,/runcloudml.py,W0321 23:39:16.388411 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers,ERROR
,12kr8b5g18a5qd4,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388589142Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.3885891,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._first_stage_box_predictor._prediction_heads.box_encodings.0._box_encoder_layers,WARNING
,12kr8b5g18a5qd3,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388432025Z,ERROR,,,us-central1-c,root,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,328,1616369956.388432,/runcloudml.py,W0321 23:39:16.388255 140231166109440 util.py:161] Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._class_prediction_head._class_predictor_layers.2,ERROR
,12kr8b5g18a5qd2,ml_job,master-replica-0,object-detection-using-tf2,segmentation_maskrcnn_03_22_2021_00_32_10,2021-03-21T23:39:16.388411998Z,WARNING,,,us-central1-c,tensorflow,8476132014993074576,cmle-training-17993994700587876019,projects/object-detection-using-tf2/logs/master-replica-0,2021-03-21T23:39:19.259481722Z,161,1616369956.388412,/root/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py,Unresolved object in checkpoint: (root).model._mask_rcnn_box_predictor._third_stage_heads.mask_predictions._mask_predictor_layers,WARNING

6. System information

Srikeshram commented 3 years ago

Check whether the GPU is detected by the Tensorflow using the Python code in the link. If it is not detected, then it may be caused due to the conflict in the CUDA version and the Tensorflow version in the Google Cloud. Waiting for your Reply.

sniper0110 commented 3 years ago

How do I check this when I don't have access to the code used on AI platform for the object detection API? I think it's a docker image. If it was my code or my docker image then I know how to check that, but because it's not mine, I am finding it difficult to verify anything.

meet-seth commented 3 years ago

I am facing the similar issue. The command I ran is the same as mentioned above. The command I ran -

gcloud ai-platform jobs submit training segmentation_maskrcnn_`date +%m_%d_%Y_%H_%M_%S` \
    --runtime-version 2.1 \
    --python-version 3.7 \
    --job-dir=gs://${MODEL_DIR} \
    --package-path ./object_detection \
    --module-name object_detection.model_main_tf2 \
    --region us-central1 \
    --scale-tier CUSTOM \
    --master-machine-type n1-highcpu-16 \
    --master-accelerator count=2,type=nvidia-tesla-v100 \
    -- \
    --model_dir=gs://${MODEL_DIR} \
    --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}

Here is a screenshot - Screenshot from 2021-06-14 18-00-02 Screenshot from 2021-06-14 18-00-16

The GPU utilization is 0 and the training is taking 5 hrs than the usual 2 hrs.

PS: Does this mean that I am being charged for the gpu even when the utilization is 0?

meet-seth commented 3 years ago

Any updates?

zig-zaggity commented 3 years ago

I am experiencing the same issue. Has anyone found a solution?