Closed shishir-reddy closed 2 years ago
Failed to get matching files on /opt/models/wgs/model.ckpt: UNIMPLEMENTED: File system scheme '[local]' not implemented (file: '/opt/models/wgs/model.ckpt')
Could you try to save the model on the cloud (path should start with gs://)? It looks that model is not accessible from the TPU host.
Sure, I am trying to use the default WGS model. After hosting the model on the cloud, how do I point deepvariant to it through the Docker solution?
This is what I see in the local docker container's models directory when running the image:
root@8368b35e9c34:/# ls /opt/models/wgs/
model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.input_shape model.ckpt.meta
I am using the google/deepvariant:1.3.0 docker image. The same error occurs for me with the GPU version. Is there a different model expected for the TPU implementation?
When you do ls /opt/models/wgs/
you see the local content of the mounted directory which is probably not accessible from TPU host. Although, we don't officially support running on TPU there is an older version case study that shows how to run training on TPU here
In particular, there is a link with instructions how to make storage bucket accessible from the docker.
Thanks, this makes perfect sense! I did not realize that the hosting the model in Google Storage was necessary for the TPU Node.
I am still having an issue pointing deepvariant to the model hosted in the cloud.
I have tried using a model in the deepvariant bucket with the following command and model: gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt.data-00000-of-00001
docker run \
-v `pwd`:`pwd` -w `pwd` \
google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--call_variants_extra_args use_tpu=true,tpu_name="variantcaller-node1",tpu_zone="europe-west4-a" \
--customized_model "gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt.data-00000-of-00001" \
--model_type=WGS \
--ref="input/data/${REF}" \
--reads="input/data/${BAM}" \
--output_vcf="output/${OUTPUT_VCF}" \
--output_gvcf="output/${OUTPUT_GVCF}" \
--regions chr20 \
--num_shards=$(nproc) \
--intermediate_results_dir /output/intermediate_results_dir
But I get the following error:
I0527 20:42:08.331003 139757477517120 run_deepvariant.py:341] Creating a directory for intermediate results in /output/intermediate_results_dir
Traceback (most recent call last):
File "/opt/deepvariant/bin/run_deepvariant.py", line 493, in <module>
app.run(main)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/opt/deepvariant/bin/run_deepvariant.py", line 467, in main
commands_logfiles = create_all_commands_and_logfiles(intermediate_results_dir)
File "/opt/deepvariant/bin/run_deepvariant.py", line 382, in create_all_commands_and_logfiles
check_flags()
File "/opt/deepvariant/bin/run_deepvariant.py", line 357, in check_flags
raise RuntimeError('The model files {}* do not exist. Potentially '
RuntimeError: The model files gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt.data-00000-of-00001* do not exist. Potentially relevant issue: https://github.com/google/deepvariant/blob/r1.3/docs/FAQ.md#why-cant-it-find-one-of-the-input-files-eg-could-not-open
I also get the same error when hosting the model (renamed model.ckpt) in my personal GS bucket -- I have made the storage bucket read accessible to all users so the TPU should have access:
docker run \
-v `pwd`:`pwd` -w `pwd` \
google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--call_variants_extra_args use_tpu=true,tpu_name="variantcaller-node1",tpu_zone="europe-west4-a" \
--customized_model "gs://tpu-bwb/analysis-files/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt" \
--model_type=WGS \
--ref="input/data/${REF}" \
--reads="input/data/${BAM}" \
--output_vcf="output/${OUTPUT_VCF}" \
--output_gvcf="output/${OUTPUT_GVCF}" \
--regions chr20 \
--num_shards=$(nproc) \
--intermediate_results_dir /output/intermediate_results_dir
I0527 21:26:03.381308 140127359940416 run_deepvariant.py:341] Creating a directory for intermediate results in /output/intermediate_results_dir
Traceback (most recent call last):
File "/opt/deepvariant/bin/run_deepvariant.py", line 493, in <module>
app.run(main)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/opt/deepvariant/bin/run_deepvariant.py", line 467, in main
commands_logfiles = create_all_commands_and_logfiles(intermediate_results_dir)
File "/opt/deepvariant/bin/run_deepvariant.py", line 382, in create_all_commands_and_logfiles
check_flags()
File "/opt/deepvariant/bin/run_deepvariant.py", line 357, in check_flags
raise RuntimeError('The model files {}* do not exist. Potentially '
RuntimeError: The model files gs://tpu-bwb/analysis-files/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt* do not exist. Potentially relevant issue: https://github.com/google/deepvariant/blob/r1.3/docs/FAQ.md#why-cant-it-find-one-of-the-input-files-eg-could-not-open
However, if I shorten the model name in the deepvariant bucket (model.ckpt.data-00000-of-00001 -> model.ckpt), the file is found and processing continues until the previous error is met because the checkpoint file does not actually exist under the name model.ckpt in the deepvariant bucket.
docker run \
-v `pwd`:`pwd` -w `pwd` \
google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--call_variants_extra_args use_tpu=true,tpu_name="variantcaller-node1",tpu_zone="europe-west4-a" \
--customized_model "gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt" \
--model_type=WGS \
--ref="input/data/${REF}" \
--reads="input/data/${BAM}" \
--output_vcf="output/${OUTPUT_VCF}" \
--output_gvcf="output/${OUTPUT_GVCF}" \
--regions chr20 \
--num_shards=$(nproc) \
--intermediate_results_dir /output/intermediate_results_dir
INFO:tensorflow:Done calling model_fn.
I0527 21:33:10.817516 139926144051008 estimator.py:1164] Done calling model_fn.
INFO:tensorflow:TPU job name tpu_worker
I0527 21:33:11.115715 139926144051008 tpu_estimator.py:514] TPU job name tpu_worker
INFO:tensorflow:Graph was finalized.
I0527 21:33:11.664746 139926144051008 monitored_session.py:247] Graph was finalized.
INFO:tensorflow:Restoring parameters from gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt
I0527 21:33:11.801618 139926144051008 saver.py:1298] Restoring parameters from gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt
INFO:tensorflow:prediction_loop marked as finished
I0527 21:33:13.662127 139926144051008 error_handling.py:115] prediction_loop marked as finished
WARNING:tensorflow:Reraising captured error
W0527 21:33:13.662372 139926144051008 error_handling.py:149] Reraising captured error
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
return fn(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.NotFoundError: From /job:tpu_worker/replica:0/task:0:
Unsuccessful TensorSliceReader constructor: Failed to find any matching files for gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt
[[{{node save_1/RestoreV2}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 1303, in restore
sess.run(self.saver_def.restore_op_name,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 967, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1190, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1368, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: From /job:tpu_worker/replica:0/task:0:
Unsuccessful TensorSliceReader constructor: Failed to find any matching files for gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt
[[node save_1/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py:623) ]]
Original stack trace for 'save_1/RestoreV2':
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 493, in <module>
tf.compat.v1.app.run()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/absl_py/absl/app.py", line 299, in run
_run_main(main, args)
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/absl_py/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 474, in main
call_variants(
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 433, in call_variants
prediction = next(predictions)
File "usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3142, in predict
for result in super(TPUEstimator, self).predict(
File "usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict
with tf.compat.v1.train.MonitoredSession(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1035, in __init__
super(MonitoredSession, self).__init__(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 750, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1232, in __init__
_WrappedSession.__init__(self, self._create_session())
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1237, in _create_session
return self._sess_creator.create_session()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 903, in create_session
self.tf_sess = self._session_creator.create_session()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 661, in create_session
self._scaffold.finalize()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 236, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 607, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 836, in __init__
self.build()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 848, in build
self._build(self._filename, build_save=True, build_restore=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 876, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 509, in _build_internal
restore_op = self._AddShardedRestoreOps(filename_tensor, per_device,
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 383, in _AddShardedRestoreOps
self._AddRestoreOps(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 335, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 583, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1490, in restore_v2
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3557, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 69, in get_tensor
return CheckpointReader.CheckpointReader_GetTensor(
RuntimeError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 1314, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 1632, in object_graph_key_mapping
object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 74, in get_tensor
error_translator(e)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 35, in error_translator
raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 493, in <module>
tf.compat.v1.app.run()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/tmp/Bazel.runfiles_2gnuyvf0/runfiles/absl_py/absl/app.py", line 299, in run
_run_main(main, args)
File "/tmp/Bazel.runfiles_2gnuyvf0/runfiles/absl_py/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 474, in main
call_variants(
File "/tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 433, in call_variants
prediction = next(predictions)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3153, in predict
rendezvous.raise_errors()
File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
six.reraise(typ, value, traceback)
File "/tmp/Bazel.runfiles_2gnuyvf0/runfiles/six_archive/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3142, in predict
for result in super(TPUEstimator, self).predict(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict
with tf.compat.v1.train.MonitoredSession(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1035, in __init__
super(MonitoredSession, self).__init__(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 750, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1232, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1237, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 903, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 662, in create_session
return self._get_session_manager().prepare_session(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/session_manager.py", line 314, in prepare_session
sess, is_loaded_from_checkpoint = self._restore_checkpoint(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/session_manager.py", line 233, in _restore_checkpoint
_restore_checkpoint_and_maybe_run_saved_model_initializers(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/session_manager.py", line 71, in _restore_checkpoint_and_maybe_run_saved_model_initializers
saver.restore(sess, path)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 1319, in restore
raise _wrap_restore_error_with_msg(
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
From /job:tpu_worker/replica:0/task:0:
Unsuccessful TensorSliceReader constructor: Failed to find any matching files for gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt
[[node save_1/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py:623) ]]
Original stack trace for 'save_1/RestoreV2':
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 493, in <module>
tf.compat.v1.app.run()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/absl_py/absl/app.py", line 299, in run
_run_main(main, args)
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/absl_py/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 474, in main
call_variants(
File "tmp/Bazel.runfiles_2gnuyvf0/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 433, in call_variants
prediction = next(predictions)
File "usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3142, in predict
for result in super(TPUEstimator, self).predict(
File "usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 623, in predict
with tf.compat.v1.train.MonitoredSession(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1035, in __init__
super(MonitoredSession, self).__init__(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 750, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1232, in __init__
_WrappedSession.__init__(self, self._create_session())
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1237, in _create_session
return self._sess_creator.create_session()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 903, in create_session
self.tf_sess = self._session_creator.create_session()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 661, in create_session
self._scaffold.finalize()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 236, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 607, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 836, in __init__
self.build()
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 848, in build
self._build(self._filename, build_save=True, build_restore=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 876, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 509, in _build_internal
restore_op = self._AddShardedRestoreOps(filename_tensor, per_device,
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 383, in _AddShardedRestoreOps
self._AddRestoreOps(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 335, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 583, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1490, in restore_v2
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3557, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)
Is there something simple I am missing here? Thanks for the support.
@shishir-reddy
Try just:
--customized_model "gs://deepvariant/models/DeepVariant/1.3.0/DeepVariant-inception_v3-1.3.0+data-wgs_standard/model.ckpt" \
Sorry, @akolesnikov pointed out that you tried both. I don't have an immediate answer to the second error then.
Hi, I just wanted to check in to see if there are any updates on this thread? Thanks!
Hi,
(model.ckpt.data-00000-of-00001 -> model.ckpt) is the right way to pass the model. May I ask you a more general question? What is the reason you want to run inference on TPU? In general it is not advisable because TPU processing is way too fast for the inference. The infeed cannot supply examples fast enough.
I am just benchmarking TPU usage on DeepVariant to see if there is a significant speedup as compared to GPU. There were supporting flags in the call_variants step, so I wanted to test with TPU. If TPU is not recommended for inference, then I will switch over to training and try from there, thanks!
Is there a solution to the second error that occurs when renaming (model.ckpt.data-00000-of-00001 -> model.ckpt), or is this not supported for TPU usage?
Unfortunately, we don't officially support running on TPU at the moment. The way you ran it when using a short model name looks correct. It could be an access control issue (there is no read access to the bucket containing the model from TPU host).
I am trying to use deepvariant to call variants using a TPU Node v3-8, but I am running into a persistent issue.
Here is the command I am using:
However, I am seeing the following error in the call variants step.
This same command works fine without using TPUs on this system, and it looks like the TPU node is being recognized by deepvariant. Is there something I'm missing for call_variants?