Closed tordbb closed 2 years ago
Your raw_data_metadata should describe your raw_data as it actually is, not how it's intended to be. Your options are:
Option (1) is usually preferred, as this logic will be applied both at preprocessing for training, but also for inference, which will help in avoiding training/inference skew.
Hi Zoyahav,
Thank you for reaching out, this is highly apreciated!
Below is my failing attempt at creating such a preprocessing_fn, but first - a higher-level question - Am I approaching this problem in the wrong way?
My main goal is to change how a column should be interpreted. Two possible scenarios:
["3.2" , "1.5"]
), and so tensorflow schema interprets it as strings. But the column represents temperatures, so I need to transform these to number. [3.2 , 1.5]
), and so tensorflow schema interprets it as numbers. In reality, these numbers represent hierarchical categories of animals (3.2 => dog.dalmatine
), and so I need to make tensorflow interpret them as strings. What is the best way to do this using tensorflow, so that the data is formatted correctly in exampleGen, schemaGen, Trainer etc?
If you think the best way involves changing types using tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)
, then I would very much apreciate any thoughts on why this new piece of code fails:
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import dataset_schema
import tensorflow as tf
import tempfile
import tensorflow_transform.beam as tft_beam
import pdb
import warnings
def test_tutorial_schema():
cols = ["Ints", "Ints to Strings", "Strings to Ints", "Strings"]
raw_data = [
{cols[0]: 1, cols[1]: 1, cols[2]: "3", cols[3]: "hello"},
{cols[0]: 2, cols[1]: 2, cols[2]: "4", cols[3]: "world"},
]
original_types = [tf.float32, tf.float32, tf.string, tf.string]
intended_types = [tf.float32, tf.string, tf.float32, tf.string]
# Todo: generate these automatically based on intended_types
true_types = [tf.float32, bytes, tf.float32, bytes]
wrong_types = [bytes, tf.float32, bytes, tf.float32]
# Todo: make this independendt of length of cols and original_types. For loop?
raw_data_metadata = dataset_metadata.DatasetMetadata(
dataset_schema.from_feature_spec(
{
cols[0]: tf.io.FixedLenFeature([], original_types[0]),
cols[1]: tf.io.FixedLenFeature([], original_types[1]),
cols[2]: tf.io.FixedLenFeature([], original_types[2]),
cols[3]: tf.io.FixedLenFeature([], original_types[3]),
}
)
)
# Generate a preprocessing fn having intended_types internalized
preprocessing_fn = generate_preprocessing_fn(intended_types)
# Run the preprocessing_fn which
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
transformed_dataset, transform_fn = (
raw_data,
raw_data_metadata,
) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)
transformed_data, transformed_metadata = transformed_dataset
for col, typ, wrong in zip(cols, true_types, wrong_types):
# import pdb
# pdb.set_trace()
warnings.warn(
f"\n\n: : {transformed_data[0][col]}: {type(transformed_data[0][col])}"
)
assert type(transformed_data[0][col]) == typ
assert type(transformed_data[1][col]) == typ
assert not type(transformed_data[0][col]) == wrong
assert not type(transformed_data[1][col]) == wrong
def generate_preprocessing_fn(intended_types):
"""Return function which can alter column types."""
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
for num, col_name in enumerate(inputs):
current_type = inputs[col_name].dtype
if not current_type == intended_types[num]:
warnings.warn(
f"\n\n: : num: {num}, "
f"\n: : col_name: {col_name}"
f"\n: : current_type: {current_type},"
f"\n: : intended_types[num]: {intended_types[num]}"
f"\n: : inputs[col_name][0]: {inputs[col_name][0]}"
)
if intended_types[num] == tf.string:
inputs[col_name] = tf.strings.as_string(inputs[col_name])
else:
# pdb.set_trace()
inputs[col_name] = tf.strings.to_number(
inputs[col_name], intended_types[num]
)
return inputs
return preprocessing_fn
Giving the following error:
self = <tensorflow.python.client.session.Session object at 0x7fb030202290>
handle = None
fetches = [<tf.Tensor 'transform/inputs/inputs/Ints_copy:0' shape=(None,) dtype=float32>, <tf.Tensor 'transform/AsString:0' shap...hape=(None,) dtype=float32>, <tf.Tensor 'transform/inputs/inputs/F_Strings_to_Ints_copy:0' shape=(None,) dtype=string>]
feed_dict = {<tf.Tensor 'transform/inputs/Ints:0' shape=(None,) dtype=float32>: [1], <tf.Tensor 'transform/inputs/F_Ints_to_String...(None,) dtype=string>: ['hello'], <tf.Tensor 'transform/inputs/F_Strings_to_Ints:0' shape=(None,) dtype=string>: ['3']}
options = None, run_metadata = None
def _run(self, handle, fetches, feed_dict, options, run_metadata):
"""Perform either run or partial_run, depending the presence of `handle`."""
def _feed_fn(feed, feed_val):
for tensor_type, _, feed_fn, _ in _REGISTERED_EXPANSIONS:
if isinstance(feed, tensor_type):
return feed_fn(feed, feed_val)
raise TypeError('Feed argument %r has invalid type %r' %
(feed, type(feed)))
# Check session.
if self._closed:
raise RuntimeError('Attempted to use a closed Session.')
if self.graph.version == 0:
raise RuntimeError('The Session graph is empty. Add operations to the '
'graph before calling run().')
# Create request.
feed_dict_tensor = {}
feed_map = {}
# Validate and process feed_dict.
feed_handles = {}
if feed_dict:
feed_dict = nest.flatten_dict_items(feed_dict)
for feed, feed_val in feed_dict.items():
for subfeed, subfeed_val in _feed_fn(feed, feed_val):
try:
subfeed_t = self.graph.as_graph_element(
subfeed, allow_tensor=True, allow_operation=False)
except Exception as e:
raise TypeError('Cannot interpret feed_dict key as Tensor: ' +
e.args[0])
if isinstance(subfeed_val, ops.Tensor):
raise TypeError('The value of a feed cannot be a tf.Tensor object. '
'Acceptable feed values include Python scalars, '
'strings, lists, numpy ndarrays, or TensorHandles. '
'For reference, the tensor object was ' +
str(feed_val) + ' which was passed to the '
'feed with key ' + str(feed) + '.')
subfeed_dtype = subfeed_t.dtype.as_numpy_dtype
if isinstance(subfeed_val, int) and _convert_to_numpy_obj(
subfeed_dtype, subfeed_val) != subfeed_val:
raise TypeError(
'Type of feed value ' + str(subfeed_val) + ' with type ' +
str(type(subfeed_val)) +
' is not compatible with Tensor type ' + str(subfeed_dtype) +
'. Try explicitly setting the type of the feed tensor'
' to a larger type (e.g. int64).')
is_tensor_handle_feed = isinstance(subfeed_val,
session_ops.TensorHandle)
if is_tensor_handle_feed:
np_val = subfeed_val.to_numpy_array()
feed_handles[subfeed_t.ref()] = subfeed_val
else:
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
if (not is_tensor_handle_feed and
not subfeed_t.get_shape().is_compatible_with(np_val.shape)):
raise ValueError(
'Cannot feed value of shape %r for Tensor %r, '
'which has shape %r' %
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
if not self.graph.is_feedable(subfeed_t):
raise ValueError('Tensor %s may not be fed.' % subfeed_t)
feed_dict_tensor[subfeed_t.ref()] = np_val
feed_map[compat.as_bytes(subfeed_t.name)] = (subfeed_t, subfeed_val)
# Create a fetch handler to take care of the structure of fetches.
fetch_handler = _FetchHandler(
self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
# Run request and get response.
# We need to keep the returned movers alive for the following _do_run().
# These movers are no longer needed when _do_run() completes, and
# are deleted when `movers` goes out of scope when this _run() ends.
# TODO(yuanbyu, keveman): Revisit whether we should just treat feeding
# of a handle from a different device as an error.
_ = self._update_with_movers(feed_dict_tensor, feed_map)
final_fetches = fetch_handler.fetches()
final_targets = fetch_handler.targets()
# We only want to really perform the run if fetches or targets are provided,
# or if the call is a partial run that specifies feeds.
if final_fetches or final_targets or (handle and feed_dict_tensor):
results = self._do_run(handle, final_targets, final_fetches,
> feed_dict_tensor, options, run_metadata)
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:1181:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tensorflow.python.client.session.Session object at 0x7fb030202290>
handle = None, target_list = []
fetch_list = [<tf.Tensor 'transform/inputs/inputs/Ints_copy:0' shape=(None,) dtype=float32>, <tf.Tensor 'transform/AsString:0' shap...hape=(None,) dtype=float32>, <tf.Tensor 'transform/inputs/inputs/F_Strings_to_Ints_copy:0' shape=(None,) dtype=string>]
feed_dict = {<Reference wrapping <tf.Tensor 'transform/inputs/Ints:0' shape=(None,) dtype=float32>>: array([1.], dtype=float32), <...ce wrapping <tf.Tensor 'transform/inputs/F_Strings_to_Ints:0' shape=(None,) dtype=string>>: array(['3'], dtype=object)}
options = None, run_metadata = None
def _do_run(self, handle, target_list, fetch_list, feed_dict, options,
run_metadata):
"""Runs a step based on the given fetches and feeds.
Args:
handle: a handle for partial_run. None if this is just a call to run().
target_list: A list of operations to be run, but not fetched.
fetch_list: A list of tensors to be fetched.
feed_dict: A dictionary that maps tensors to numpy ndarrays.
options: A (pointer to a) [`RunOptions`] protocol buffer, or None
run_metadata: A (pointer to a) [`RunMetadata`] protocol buffer, or None
Returns:
A list of numpy ndarrays, corresponding to the elements of
`fetch_list`. If the ith element of `fetch_list` contains the
name of an operation, the first Tensor output of that operation
will be returned for that element.
Raises:
tf.errors.OpError: Or one of its subclasses on error.
"""
# pylint: disable=protected-access
feeds = dict((t.deref()._as_tf_output(), v) for t, v in feed_dict.items())
fetches = [t._as_tf_output() for t in fetch_list]
targets = [op._c_op for op in target_list]
# pylint: enable=protected-access
def _run_fn(feed_dict, fetch_list, target_list, options, run_metadata):
# Ensure any changes to the graph are reflected in the runtime.
self._extend_graph()
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
def _prun_fn(handle, feed_dict, fetch_list):
if target_list:
raise RuntimeError('partial_run() requires empty target_list.')
return self._call_tf_sessionprun(handle, feed_dict, fetch_list)
if handle is None:
return self._do_call(_run_fn, feeds, fetches, targets, options,
> run_metadata)
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:1359:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tensorflow.python.client.session.Session object at 0x7fb030202290>
fn = <function BaseSession._do_run.<locals>._run_fn at 0x7fb0301ecb90>
args = ({<tensorflow.python._pywrap_tf_session.TF_Output object at 0x7fb0302029f0>: array([1.], dtype=float32), <tensorflow.p... object at 0x7fb03020c130>, <tensorflow.python._pywrap_tf_session.TF_Output object at 0x7fb030208370>], [], None, None)
message = 'StringToNumberOp could not correctly convert string: hello\n\t [[node transform/StringToNumber (defined at /site-packages/tensorflow_transform/saved/saved_transform_io.py:274) ]]'
m = <re.Match object; span=(61, 96), match='[[{{node transform/StringToNumber}}'>
def _do_call(self, fn, *args):
try:
return fn(*args)
except errors.OpError as e:
message = compat.as_text(e.message)
m = BaseSession._NODEDEF_NAME_RE.search(message)
node_def = None
op = None
if m is not None:
node_name = m.group(3)
try:
op = self._graph.get_operation_by_name(node_name)
node_def = op.node_def
except KeyError:
pass
message = error_interpolation.interpolate(message, self._graph)
if 'only supports NHWC tensor format' in message:
message += ('\nA possible workaround: Try disabling Grappler optimizer'
'\nby modifying the config for creating the session eg.'
'\nsession_config.graph_options.rewrite_options.'
'disable_meta_optimizer = True')
> raise type(e)(node_def, op, message)
E tensorflow.python.framework.errors_impl.InvalidArgumentError: StringToNumberOp could not correctly convert string: hello
E [[node transform/StringToNumber (defined at /site-packages/tensorflow_transform/saved/saved_transform_io.py:274) ]]
E
E Original stack trace for 'transform/StringToNumber':
E File "/threading.py", line 890, in _bootstrap
E self._bootstrap_inner()
E File "/threading.py", line 926, in _bootstrap_inner
E self.run()
E File "/site-packages/apache_beam/utils/thread_pool_executor.py", line 70, in run
E self._work_item.run()
E File "/site-packages/apache_beam/utils/thread_pool_executor.py", line 44, in run
E self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 1048, in execute
E part_map, expected_outputs, fired_timers, expected_output_timers)
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 947, in process_bundle
E result_future = self._worker_handler.control_conn.push(process_bundle_req)
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py", line 349, in push
E response = self.worker.do_instruction(request)
E File "/site-packages/apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
E getattr(request, request_type), request.instruction_id)
E File "/site-packages/apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
E bundle_processor.process_bundle(instruction_id))
E File "/site-packages/apache_beam/runners/worker/bundle_processor.py", line 972, in process_bundle
E element.data)
E File "/site-packages/apache_beam/runners/worker/bundle_processor.py", line 218, in process_encoded
E self.output(decoded_value)
E File "/site-packages/tensorflow_transform/beam/impl.py", line 416, in process
E lambda: self._make_graph_state(saved_model_dir))
E File "/site-packages/tfx_bsl/beam/shared.py", line 221, in acquire
E return _shared_map.acquire(self._key, constructor_fn)
E File "/site-packages/tfx_bsl/beam/shared.py", line 184, in acquire
E result = control_block.acquire(constructor_fn)
E File "/site-packages/tfx_bsl/beam/shared.py", line 87, in acquire
E result = constructor_fn()
E File "/site-packages/tensorflow_transform/beam/impl.py", line 416, in <lambda>
E lambda: self._make_graph_state(saved_model_dir))
E File "/site-packages/tensorflow_transform/beam/impl.py", line 385, in _make_graph_state
E self._exclude_outputs, self._tf_config)
E File "/site-packages/tensorflow_transform/beam/impl.py", line 209, in __init__
E saved_model_dir, {}))
E File "/site-packages/tensorflow_transform/saved/saved_transform_io.py", line 429, in partially_apply_saved_transform_internal
E saved_model_dir, logical_input_map, tensor_replacement_map)
E File "/site-packages/tensorflow_transform/saved/saved_transform_io.py", line 274, in _partially_apply_saved_transform_impl
E input_map=input_map)
E File "/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
E **kwargs)[0]
E File "/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
E **kwargs))
E File "/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
E return_elements=return_elements)
E File "/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
E return func(*args, **kwargs)
E File "/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
E producer_op_list=producer_op_list)
E File "/site-packages/tensorflow/python/framework/importer.py", line 513, in _import_graph_def_internal
E _ProcessNewOps(graph)
E File "/site-packages/tensorflow/python/framework/importer.py", line 243, in _ProcessNewOps
E for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
E File "/site-packages/tensorflow/python/framework/ops.py", line 3624, in _add_new_tf_operations
E for c_op in c_api_util.new_tf_operations(self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 3624, in <listcomp>
E for c_op in c_api_util.new_tf_operations(self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 3510, in _create_op_from_tf_operation
E ret = Operation(c_op, self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
E self._traceback = tf_stack.extract_stack()
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:1384: InvalidArgumentError
During handling of the above exception, another exception occurred:
> ???
apache_beam/runners/common.py:961:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
apache_beam/runners/common.py:726:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
apache_beam/runners/common.py:807:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
apache_beam/runners/common.py:1095:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tensorflow_transform.beam.impl._RunMetaGraphDoFn object at 0x7fb0301edbd0>
batch = [{'Ints': 1, 'Ints to Strings': 1, 'Strings': 'hello', 'Strings to Ints': '3'}]
saved_model_dir = '/tmp/tmpk509uook/tftransform_tmp/e744b1b491a04251b1ad4e5e458b4d3c'
def process(self, batch, saved_model_dir):
"""Runs the given graph to realize the output `Tensor` or `SparseTensor`s.
Runs the graph in a TF session for computing the output values of the
`Tensor` or `SparseTensor`s, given an input row of data (input `Tensor` or
`SparseTensor`s).
Args:
batch: the batch of elements being processed by the DoFn
saved_model_dir: Directory containing saved model.
Yields:
A representation of output features as a dict mapping keys (logical column
names) to values.
"""
if self._graph_state is None:
# If available, acquire will return a cached _GraphState, since calling
# _make_graph_state is expensive.
self._graph_state = self._shared_graph_state_handle.acquire(
lambda: self._make_graph_state(saved_model_dir))
# This should remain true throughout the lifetime of this DoFn, regardless
# of whether or not self._graph_state was cached.
assert self._graph_state.saved_model_dir == saved_model_dir
> yield self._handle_batch(batch)
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:422:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tensorflow_transform.beam.impl._RunMetaGraphDoFn object at 0x7fb0301edbd0>
batch = [{'Ints': 1, 'Ints to Strings': 1, 'Strings': 'hello', 'Strings to Ints': '3'}]
def _handle_batch(self, batch):
self._update_metrics(batch)
# Remove passthrough keys from the input data to make sure preprocessing_fn
# won't see them. Making a copy of batch because mutating PCollection
# elements is not allowed.
# No need to remove (and cannot remove) the passthrough columns if
# tfxio is used:
# 1) The TensorAdapter expects the RecordBatch to be of the same schema as
# statically determined by the TFXIO implementation the yields the
# TensorAdapter.
# 2) It's not possible to leak passthrough columns through TensorAdapter
# because they are not going to be converted to Tensors.
passthrough_data = None
if self._passthrough_keys and not self._use_tfxio:
batch = [copy.copy(x) for x in batch]
passthrough_data = {
key: [instance.pop(key) for instance in batch
] for key in self._passthrough_keys
}
feed_list = self._make_feed_list(batch)
try:
outputs_list = self._graph_state.callable_get_outputs(*feed_list)
except Exception as e:
raise ValueError(
"""An error occured while trying to apply the transformation: "{}".
Batch instances: {},
Fetching the values for the following Tensor keys: {}.""".format(
> str(e), batch, self._graph_state.outputs_tensor_keys))
E ValueError: An error occured while trying to apply the transformation: "StringToNumberOp could not correctly convert string: hello
E [[node transform/StringToNumber (defined at /site-packages/tensorflow_transform/saved/saved_transform_io.py:274) ]]
E
E Original stack trace for 'transform/StringToNumber':
E File "/threading.py", line 890, in _bootstrap
E self._bootstrap_inner()
E File "/threading.py", line 926, in _bootstrap_inner
E self.run()
E File "/site-packages/apache_beam/utils/thread_pool_executor.py", line 70, in run
E self._work_item.run()
E File "/site-packages/apache_beam/utils/thread_pool_executor.py", line 44, in run
E self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 1048, in execute
E part_map, expected_outputs, fired_timers, expected_output_timers)
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 947, in process_bundle
E result_future = self._worker_handler.control_conn.push(process_bundle_req)
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py", line 349, in push
E response = self.worker.do_instruction(request)
E File "/site-packages/apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
E getattr(request, request_type), request.instruction_id)
E File "/site-packages/apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
E bundle_processor.process_bundle(instruction_id))
E File "/site-packages/apache_beam/runners/worker/bundle_processor.py", line 972, in process_bundle
E element.data)
E File "/site-packages/apache_beam/runners/worker/bundle_processor.py", line 218, in process_encoded
E self.output(decoded_value)
E File "/site-packages/tensorflow_transform/beam/impl.py", line 416, in process
E lambda: self._make_graph_state(saved_model_dir))
E File "/site-packages/tfx_bsl/beam/shared.py", line 221, in acquire
E return _shared_map.acquire(self._key, constructor_fn)
E File "/site-packages/tfx_bsl/beam/shared.py", line 184, in acquire
E result = control_block.acquire(constructor_fn)
E File "/site-packages/tfx_bsl/beam/shared.py", line 87, in acquire
E result = constructor_fn()
E File "/site-packages/tensorflow_transform/beam/impl.py", line 416, in <lambda>
E lambda: self._make_graph_state(saved_model_dir))
E File "/site-packages/tensorflow_transform/beam/impl.py", line 385, in _make_graph_state
E self._exclude_outputs, self._tf_config)
E File "/site-packages/tensorflow_transform/beam/impl.py", line 209, in __init__
E saved_model_dir, {}))
E File "/site-packages/tensorflow_transform/saved/saved_transform_io.py", line 429, in partially_apply_saved_transform_internal
E saved_model_dir, logical_input_map, tensor_replacement_map)
E File "/site-packages/tensorflow_transform/saved/saved_transform_io.py", line 274, in _partially_apply_saved_transform_impl
E input_map=input_map)
E File "/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
E **kwargs)[0]
E File "/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
E **kwargs))
E File "/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
E return_elements=return_elements)
E File "/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
E return func(*args, **kwargs)
E File "/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
E producer_op_list=producer_op_list)
E File "/site-packages/tensorflow/python/framework/importer.py", line 513, in _import_graph_def_internal
E _ProcessNewOps(graph)
E File "/site-packages/tensorflow/python/framework/importer.py", line 243, in _ProcessNewOps
E for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
E File "/site-packages/tensorflow/python/framework/ops.py", line 3624, in _add_new_tf_operations
E for c_op in c_api_util.new_tf_operations(self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 3624, in <listcomp>
E for c_op in c_api_util.new_tf_operations(self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 3510, in _create_op_from_tf_operation
E ret = Operation(c_op, self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
E self._traceback = tf_stack.extract_stack()
E ".
E Batch instances: [{'Ints': 1, 'Ints to Strings': 1, 'Strings to Ints': '3', 'Strings': 'hello'}],
E Fetching the values for the following Tensor keys: ['Ints', 'Ints to Strings', 'Strings', 'Strings to Ints'].
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:367: ValueError
During handling of the above exception, another exception occurred:
def test_tutorial_schema():
cols = ["Ints", "Ints to Strings", "Strings to Ints", "Strings"]
raw_data = [
{cols[0]: 1, cols[1]: 1, cols[2]: "3", cols[3]: "hello"},
{cols[0]: 2, cols[1]: 2, cols[2]: "4", cols[3]: "world"},
]
original_types = [tf.float32, tf.float32, tf.string, tf.string]
intended_types = [tf.float32, tf.string, tf.float32, tf.string]
# Todo: generate these automatically based on intended_types
true_types = [tf.float32, bytes, tf.float32, bytes]
wrong_types = [bytes, tf.float32, bytes, tf.float32]
# Todo: make this independendt of length of cols and original_types. For loop?
raw_data_metadata = dataset_metadata.DatasetMetadata(
dataset_schema.from_feature_spec(
{
cols[0]: tf.io.FixedLenFeature([], original_types[0]),
cols[1]: tf.io.FixedLenFeature([], original_types[1]),
cols[2]: tf.io.FixedLenFeature([], original_types[2]),
cols[3]: tf.io.FixedLenFeature([], original_types[3]),
}
)
)
# Generate a preprocessing fn having intended_types internalized
preprocessing_fn = generate_preprocessing_fn(intended_types)
# Run the preprocessing_fn which
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
transformed_dataset, transform_fn = (
raw_data,
raw_data_metadata,
> ) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)
tests/unit/data_import/test_test.py:45:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py:567: in __ror__
p.run().wait_until_finish()
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/pipeline.py:497: in run
self._options).run(False)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/pipeline.py:510: in run
return self.runner.run_pipeline(self, self._options)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py:130: in run_pipeline
return runner.run_pipeline(pipeline, options)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:179: in run_pipeline
pipeline.to_runner_api(default_environment=self._default_environment))
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:189: in run_via_runner_api
return self.run_stages(stage_context, stages)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:335: in run_stages
bundle_context_manager,
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:545: in _run_stage
expected_timer_output)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:1051: in process_bundle
for result, split_result in executor.map(execute, part_inputs):
../miniconda/envs/tfx/lib/python3.7/concurrent/futures/_base.py:598: in result_iterator
yield fs.pop().result()
../miniconda/envs/tfx/lib/python3.7/concurrent/futures/_base.py:435: in result
return self.__get_result()
../miniconda/envs/tfx/lib/python3.7/concurrent/futures/_base.py:384: in __get_result
raise self._exception
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/utils/thread_pool_executor.py:44: in run
self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:1048: in execute
part_map, expected_outputs, fired_timers, expected_output_timers)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:947: in process_bundle
result_future = self._worker_handler.control_conn.push(process_bundle_req)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py:349: in push
response = self.worker.do_instruction(request)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py:471: in do_instruction
getattr(request, request_type), request.instruction_id)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py:506: in process_bundle
bundle_processor.process_bundle(instruction_id))
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py:972: in process_bundle
element.data)
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py:218: in process_encoded
self.output(decoded_value)
apache_beam/runners/worker/operations.py:330: in apache_beam.runners.worker.operations.Operation.output
???
apache_beam/runners/worker/operations.py:332: in apache_beam.runners.worker.operations.Operation.output
???
apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
???
apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:1030: in apache_beam.runners.common.DoFnRunner._reraise_augmented
???
apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:553: in apache_beam.runners.common.SimpleInvoker.invoke_process
???
apache_beam/runners/common.py:1122: in apache_beam.runners.common._OutputProcessor.process_outputs
???
apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
???
apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:1030: in apache_beam.runners.common.DoFnRunner._reraise_augmented
???
apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:553: in apache_beam.runners.common.SimpleInvoker.invoke_process
???
apache_beam/runners/common.py:1122: in apache_beam.runners.common._OutputProcessor.process_outputs
???
apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
???
apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:1030: in apache_beam.runners.common.DoFnRunner._reraise_augmented
???
apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:553: in apache_beam.runners.common.SimpleInvoker.invoke_process
???
apache_beam/runners/common.py:1122: in apache_beam.runners.common._OutputProcessor.process_outputs
???
apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
???
apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process
???
apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:1045: in apache_beam.runners.common.DoFnRunner._reraise_augmented
???
../miniconda/envs/tfx/lib/python3.7/site-packages/future/utils/__init__.py:446: in raise_with_traceback
raise exc.with_traceback(traceback)
apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process
???
apache_beam/runners/common.py:726: in apache_beam.runners.common.PerWindowInvoker.invoke_process
???
apache_beam/runners/common.py:807: in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
???
apache_beam/runners/common.py:1095: in apache_beam.runners.common._OutputProcessor.process_outputs
???
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:422: in process
yield self._handle_batch(batch)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tensorflow_transform.beam.impl._RunMetaGraphDoFn object at 0x7fb0301edbd0>
batch = [{'Ints': 1, 'Ints to Strings': 1, 'Strings': 'hello', 'Strings to Ints': '3'}]
def _handle_batch(self, batch):
self._update_metrics(batch)
# Remove passthrough keys from the input data to make sure preprocessing_fn
# won't see them. Making a copy of batch because mutating PCollection
# elements is not allowed.
# No need to remove (and cannot remove) the passthrough columns if
# tfxio is used:
# 1) The TensorAdapter expects the RecordBatch to be of the same schema as
# statically determined by the TFXIO implementation the yields the
# TensorAdapter.
# 2) It's not possible to leak passthrough columns through TensorAdapter
# because they are not going to be converted to Tensors.
passthrough_data = None
if self._passthrough_keys and not self._use_tfxio:
batch = [copy.copy(x) for x in batch]
passthrough_data = {
key: [instance.pop(key) for instance in batch
] for key in self._passthrough_keys
}
feed_list = self._make_feed_list(batch)
try:
outputs_list = self._graph_state.callable_get_outputs(*feed_list)
except Exception as e:
raise ValueError(
"""An error occured while trying to apply the transformation: "{}".
Batch instances: {},
Fetching the values for the following Tensor keys: {}.""".format(
> str(e), batch, self._graph_state.outputs_tensor_keys))
E ValueError: An error occured while trying to apply the transformation: "StringToNumberOp could not correctly convert string: hello
E [[node transform/StringToNumber (defined at /site-packages/tensorflow_transform/saved/saved_transform_io.py:274) ]]
E
E Original stack trace for 'transform/StringToNumber':
E File "/threading.py", line 890, in _bootstrap
E self._bootstrap_inner()
E File "/threading.py", line 926, in _bootstrap_inner
E self.run()
E File "/site-packages/apache_beam/utils/thread_pool_executor.py", line 70, in run
E self._work_item.run()
E File "/site-packages/apache_beam/utils/thread_pool_executor.py", line 44, in run
E self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 1048, in execute
E part_map, expected_outputs, fired_timers, expected_output_timers)
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 947, in process_bundle
E result_future = self._worker_handler.control_conn.push(process_bundle_req)
E File "/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py", line 349, in push
E response = self.worker.do_instruction(request)
E File "/site-packages/apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
E getattr(request, request_type), request.instruction_id)
E File "/site-packages/apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
E bundle_processor.process_bundle(instruction_id))
E File "/site-packages/apache_beam/runners/worker/bundle_processor.py", line 972, in process_bundle
E element.data)
E File "/site-packages/apache_beam/runners/worker/bundle_processor.py", line 218, in process_encoded
E self.output(decoded_value)
E File "/site-packages/tensorflow_transform/beam/impl.py", line 416, in process
E lambda: self._make_graph_state(saved_model_dir))
E File "/site-packages/tfx_bsl/beam/shared.py", line 221, in acquire
E return _shared_map.acquire(self._key, constructor_fn)
E File "/site-packages/tfx_bsl/beam/shared.py", line 184, in acquire
E result = control_block.acquire(constructor_fn)
E File "/site-packages/tfx_bsl/beam/shared.py", line 87, in acquire
E result = constructor_fn()
E File "/site-packages/tensorflow_transform/beam/impl.py", line 416, in <lambda>
E lambda: self._make_graph_state(saved_model_dir))
E File "/site-packages/tensorflow_transform/beam/impl.py", line 385, in _make_graph_state
E self._exclude_outputs, self._tf_config)
E File "/site-packages/tensorflow_transform/beam/impl.py", line 209, in __init__
E saved_model_dir, {}))
E File "/site-packages/tensorflow_transform/saved/saved_transform_io.py", line 429, in partially_apply_saved_transform_internal
E saved_model_dir, logical_input_map, tensor_replacement_map)
E File "/site-packages/tensorflow_transform/saved/saved_transform_io.py", line 274, in _partially_apply_saved_transform_impl
E input_map=input_map)
E File "/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
E **kwargs)[0]
E File "/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
E **kwargs))
E File "/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
E return_elements=return_elements)
E File "/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
E return func(*args, **kwargs)
E File "/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
E producer_op_list=producer_op_list)
E File "/site-packages/tensorflow/python/framework/importer.py", line 513, in _import_graph_def_internal
E _ProcessNewOps(graph)
E File "/site-packages/tensorflow/python/framework/importer.py", line 243, in _ProcessNewOps
E for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
E File "/site-packages/tensorflow/python/framework/ops.py", line 3624, in _add_new_tf_operations
E for c_op in c_api_util.new_tf_operations(self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 3624, in <listcomp>
E for c_op in c_api_util.new_tf_operations(self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 3510, in _create_op_from_tf_operation
E ret = Operation(c_op, self)
E File "/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
E self._traceback = tf_stack.extract_stack()
E ".
E Batch instances: [{'Ints': 1, 'Ints to Strings': 1, 'Strings to Ints': '3', 'Strings': 'hello'}],
E Fetching the values for the following Tensor keys: ['Ints', 'Ints to Strings', 'Strings', 'Strings to Ints']. [while running 'AnalyzeAndTransformDataset/TransformDataset/Transform']
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:367: ValueError
----------------------- Captured stderr call -----------------------
WARNING:tensorflow:From /home/vagrant/tf-template/tests/unit/data_import/test_test.py:32: from_feature_spec (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec
WARNING:tensorflow:Tensorflow version (2.3.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
WARNING:tensorflow:Tensorflow version (2.3.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
2020-08-19 17:25:45.899690: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-19 17:25:45.903087: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-19 17:25:45.903116: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vagrant): /proc/driver/nvidia/version does not exist
2020-08-19 17:25:45.903383: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-19 17:25:45.929021: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2592000000 Hz
2020-08-19 17:25:45.929327: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55853e2cd3c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-19 17:25:45.929347: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /home/vagrant/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
WARNING:tensorflow:From /home/vagrant/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:220: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:Tensorflow version (2.3.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
========================= warnings summary =========================
tests/unit/data_import/test_test.py::test_tutorial_schema
/home/vagrant/tf-template/tests/unit/data_import/test_test.py:70: UserWarning:
: : num: 1,
: : col_name: Ints to Strings
: : current_type: <dtype: 'float32'>,
: : intended_types[num]: <dtype: 'string'>
: : inputs[col_name][0]: Tensor("strided_slice:0", shape=(), dtype=float32)
f"\n\n: : num: {num}, "
tests/unit/data_import/test_test.py::test_tutorial_schema
/home/vagrant/tf-template/tests/unit/data_import/test_test.py:70: UserWarning:
: : num: 2,
: : col_name: Strings
: : current_type: <dtype: 'string'>,
: : intended_types[num]: <dtype: 'float32'>
: : inputs[col_name][0]: Tensor("strided_slice_1:0", shape=(), dtype=string)
f"\n\n: : num: {num}, "
-- Docs: https://docs.pytest.org/en/latest/warnings.html
The very long error above boils down to:
InvalidArgumentError: StringToNumberOp could not correctly convert string: hello
And we can also see the following snippet that helps us understand the issue:
E Batch instances: [{'Ints': 1, 'Ints to Strings': 1, 'Strings to Ints': '3', 'Strings': 'hello'}], E Fetching the values for the following Tensor keys: ['Ints', 'Ints to Strings', 'Strings', 'Strings to Ints']. [while running 'AnalyzeAndTransformDataset/TransformDataset/Transform']
Strings has the value "hello" which can't be converted to a number. The issue with your code is that you're relying on the ordering of dictionary keys to be consistent with a hardcoded list:
for num, col_name in enumerate(inputs):
current_type = inputs[col_name].dtype
if not current_type == intended_types[num]:
It can be fixed by comparing the dtypes correctly:
def generate_preprocessing_fn(intended_types, cols):
"""Return function which can alter column types."""
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
for num, col_name in enumerate(inputs):
col_index = cols.index(col_name)
...
if not current_type == intended_types[col_index]:
...
Regarding:
What is the best way to do this using tensorflow, so that the data is formatted correctly in exampleGen, schemaGen, Trainer etc?
Transform (AnalyzeAndTransformDataset) is a good way to convert your data because this conversion will happen for training, but also inference and analysis. Converting your data with Transform will only apply to components that are downstream from Transform (Trainer, Pusher, ModelValidator, etc.). This conversion wouldn't apply for exampleGen, schemaGen because those happen before Transform.
Hi Zoyahav,
Again, thank you so much for helping me out!
Your fixes helped make the test I had provided work smoothly. And your guidance is very higly apreciated!
Since then, I have tried to alter my test so that it consumes example_gen
objects instead of dictionaries containing lists. After all, my end goal is a reusable pipeline based on TFX (components like CsvExampleGen
) with the ability to change types of each column as I wish.
I have not been able to do this using AnalyzeAndTransformDataset
, however, tfx.Transform
has brought me close to the goal.
The attached notebook shows the code and where it crashes. Necessary data and files are attached but also visible in the notebook.
Could you give me a hint on how to make this work, and whether I am headed in the right direction in my choice of tfx.Transform
?
Best regards, Tord
In case it is useful to other readers (and search engines), below is the notebook attached above, just in .md
format. Below that follows the error message.
Interactive pipeline to convert column types
This notebook uses tensorflow extended components to change column types of a data file.
Import third party packages
import os import pprint from tfx.components import CsvExampleGen from tfx.utils.dsl_utils import external_input from tfx.components import SchemaGen from tfx.components import StatisticsGen from tfx.components import Transform from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext import tensorflow as tf from pathlib import Path from tf_template.utils import paths
context = InteractiveContext()
# Set up pretty print pp = pprint.PrettyPrinter()
# Path to data file root_path = paths.get_root_path()
paths.get_root_path() does the following, which does not seem to work in notebooks:
def get_root_path(): return Path(__file__).absolute().parents[2]raw_path = root_path / "data" / "raw" csv_file_path = raw_path / "fruit_list" / "fruit_list.csv"
The data file behind "csv_file_path" has the following contents:
Quantity,Cost,Fruit,Weight
1,100,Banana,2.4
1,150,Banana,3.5
0,300,Kiwi,20.23
4,200,Apple,1.4
0,100,Antiapple,-1.4
5,500,Pinapple,255# Path to transform component transformation_path = str(root_path / "tf_template" / "data_transform" / "change_types.py")
The file behind "transformation_path" has the following contents:
import tensorflow as tf import warnings def preprocessing_fn(inputs): column_names = ["Quantity", "Cost", "Fruit", "Weight"] original_types = [tf.float32, tf.float32, tf.string, tf.float32] intended_types = [tf.float32, tf.string, tf.string, tf.string] outputs = {} for col_name in inputs: current_type = inputs[col_name].dtype col_index = column_names.index(col_name) current_intended_type = intended_types[col_index] if not current_type == current_intended_type: warnings.warn(f"col_name: {col_name}") warnings.warn(f"inputs[col_name]: {inputs[col_name]}") warnings.warn(f"current_type: {current_type}") warnings.warn(f"current_intended_type: {current_intended_type}") input_col = tf.sparse.to_dense(inputs[col_name]) if current_intended_type == tf.string: outputs[col_name] = tf.strings.as_string(input_col) else: outputs[col_name] = tf.strings.to_number( input_col, current_intended_type ) return outputs# Specify original and intended column types column_names = ["Quantity", "Cost", "Fruit", "Weight"] original_types = [tf.float32, tf.float32, tf.string, tf.float32] intended_types = [tf.float32, tf.string, tf.string, tf.string]
# Load data file using CsvExampleGen. csv_files_directory = csv_file_path.parent examples = external_input(csv_files_directory) example_gen = CsvExampleGen(input=examples) context.run(example_gen)
Print contents of the first 3 (or less) rows in the training split of example_gen
# Get the URI of the output artifact representing the training examples, which is a directory train_uri = os.path.join(example_gen.outputs['examples'].get()[0].uri, 'train') # Get the list of files in this directory (all compressed TFRecord files) tfrecord_filenames = [os.path.join(train_uri, name) for name in os.listdir(train_uri)] # Create a `TFRecordDataset` to read these files dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP") # Iterate over the first 3 records and decode them. for tfrecord in dataset.take(3): serialized_example = tfrecord.numpy() example = tf.train.Example() example.ParseFromString(serialized_example) pp.pprint(example)
# Generate StatisticsGen statistics_gen = StatisticsGen(examples=example_gen.outputs["examples"]) context.run(statistics_gen)
# Generate SchemaGen schema_gen = SchemaGen( statistics=statistics_gen.outputs["statistics"], infer_feature_shape=False ) context.run(schema_gen)
# Adjust formats using Transform transform = Transform( examples=example_gen.outputs["examples"], schema=schema_gen.outputs["schema"], module_file=transformation_path, ) context.run(transform)
column_names = ["Quantity", "Cost", "Fruit", "Weight"] original_types = [tf.float32, tf.float32, tf.string, tf.float32] intended_types = [tf.float32, tf.string, tf.string, tf.string] change_types_path = str(paths.get_root_path() / "tf_template" / "data_transform" / "change_types.py") transform = Transform( examples=example_gen.outputs["examples"], schema=schema_gen.outputs["schema"], module_file=change_types_path)
context.run(transform)
transform.outputs
And the error message which results from the line context.run(transform)
is as follows:
WARNING:tensorflow:From /home/vagrant/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/components/transform/executor.py:511: Schema (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
Schema is a deprecated, use schema_utils.schema_from_feature_spec to create a `Schema`
user_module:20: UserWarning: col_name: Cost
user_module:21: UserWarning: inputs[col_name]: SparseTensor(indices=Tensor("Cost/indices:0", shape=(None, 2), dtype=int64), values=Tensor("Cost/values:0", shape=(None,), dtype=int64), dense_shape=Tensor("Cost/shape:0", shape=(2,), dtype=int64))
user_module:22: UserWarning: current_type: <dtype: 'int64'>
user_module:23: UserWarning: current_intended_type: <dtype: 'string'>
user_module:20: UserWarning: col_name: Quantity
user_module:21: UserWarning: inputs[col_name]: SparseTensor(indices=Tensor("Quantity/indices:0", shape=(None, 2), dtype=int64), values=Tensor("Quantity/values:0", shape=(None,), dtype=int64), dense_shape=Tensor("Quantity/shape:0", shape=(2,), dtype=int64))
user_module:23: UserWarning: current_intended_type: <dtype: 'float32'>
--------------------------------------
ValueErrorTraceback (most recent call last)
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(op_type_name, name, **keywords)
469 as_ref=input_arg.is_ref,
--> 470 preferred_dtype=default_dtype)
471 except TypeError as err:
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1474 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" %
-> 1475 (dtype.name, value.dtype.name, value))
1476 return value
ValueError: Tensor conversion requested dtype string for Tensor with dtype int64: <tf.Tensor 'SparseToDense_1:0' shape=(None, None) dtype=int64>
During handling of the above exception, another exception occurred:
TypeErrorTraceback (most recent call last)
<ipython-input-12-d5365d0f0017> in <module>
5 module_file=transformation_path,
6 )
----> 7 context.run(transform)
~/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run_if_ipython(*args, **kwargs)
64 # __IPYTHON__ variable is set by IPython, see
65 # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython.
---> 66 return fn(*args, **kwargs)
67 else:
68 absl.logging.warning(
~/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
166 component, pipeline_info, driver_args, metadata_connection,
167 beam_pipeline_args, additional_pipeline_args)
--> 168 execution_id = launcher.launch().execution_id
169
170 return execution_result.ExecutionResult(
~/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py in launch(self)
203 execution_decision.input_dict,
204 execution_decision.output_dict,
--> 205 execution_decision.exec_properties)
206
207 absl.logging.info('Running publisher for %s',
~/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
65 executor_context) # type: ignore
66
---> 67 executor.Do(input_dict, output_dict, exec_properties)
~/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/components/transform/executor.py in Do(self, input_dict, output_dict, exec_properties)
389 label_outputs[labels.CACHE_OUTPUT_PATH_LABEL] = cache_output
390 status_file = 'status_file' # Unused
--> 391 self.Transform(label_inputs, label_outputs, status_file)
392 absl.logging.debug('Cleaning up temp path %s on executor success',
393 temp_path)
~/miniconda/envs/tfx/lib/python3.7/site-packages/tfx/components/transform/executor.py in Transform(***failed resolving arguments***)
924 # order to fail faster if it fails.
925 analyze_input_columns = tft.get_analyze_input_columns(
--> 926 preprocessing_fn, feature_spec_or_typespecs)
927
928 if not compute_statistics and not materialize_output_paths:
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/inspect_preprocessing_fn.py in get_analyze_input_columns(preprocessing_fn, specs)
56 input_signature = impl_helper.batched_placeholders_from_specs(
57 specs)
---> 58 _ = preprocessing_fn(input_signature.copy())
59
60 tensor_sinks = graph.get_collection(analyzer_nodes.TENSOR_REPLACEMENTS)
~/tf-template/tf_template/data_transform/change_types.py in preprocessing_fn(inputs)
27 else:
28 outputs[col_name] = tf.strings.to_number(
---> 29 input_col, current_intended_type
30 )
31
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/ops/string_ops.py in string_to_number(input, out_type, name)
477 A `Tensor` of type `out_type`.
478 """
--> 479 return gen_parsing_ops.string_to_number(input, out_type, name)
480
481
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/ops/gen_parsing_ops.py in string_to_number(string_tensor, out_type, name)
2336 _, _, _op, _outputs = _op_def_library._apply_op_helper(
2337 "StringToNumber", string_tensor=string_tensor, out_type=out_type,
-> 2338 name=name)
2339 _result = _outputs[:]
2340 if _execute.must_record_gradient():
~/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(op_type_name, name, **keywords)
491 if input_arg.type != types_pb2.DT_INVALID:
492 raise TypeError("%s expected type of %s." %
--> 493 (prefix, dtypes.as_dtype(input_arg.type).name))
494 else:
495 # Update the maps with the default, if needed.
TypeError: Input 'string_tensor' of 'StringToNumber' Op has type int64 that does not match expected type of string.
Hi Tord, I'm not able to open/run colabs in zips, ideally I'd have a hosted colab I could look at. Regardless, this is still the same issue as before, tf.strings.to_number in your preprocessing_fn is being passed a Tensor with dtype int64 which is incompatible. This shouldn't be happening at all since it looks like your preprocessing_fn is only trying to convert from float32 to string:
original_types = [tf.float32, tf.float32, tf.string, tf.float32] intended_types = [tf.float32, tf.string, tf.string, tf.string]
I think that the issue here is that SchemaGen is inferring your float32 features as int64, and then since float32 != int64
, you try to call tf.strings.to_number.
You could either manually update your schema to accurately reflect your data, or update your preprocessing_fn to be consistent with the schema.
Regardless, they way the preprocessing_fn is written is a bit brittle for an automated pipeline such as TFX, I would at least change it to be something like:
def preprocessing_fn(inputs):
output_dtypes = {
"Quantity": tf.float32,
"Cost": tf.string,
"Fruit": tf.string,
"Weight": tf.string,
}
outputs = {}
for col_name, col_value in inputs.items():
current_dtype = col_value.dtype
output_dtype = output_dtypes.get(col_name)
if current_dtype != output_dtype:
if isinstance(col_value, tf.SparseTensor):
col_value = tf.sparse.to_dense(col_value)
if output_dtype == tf.string:
outputs[col_name] = tf.strings.as_string(col_value)
elif current_dtype == tf.string:
outputs[col_name] = tf.strings.to_number(col_value, output_dtype)
else:
# Both input and output dtypes are numerical, or compatible with numerical.
outputs[col_name] = tf.cast(col_value, output_dtype)
else:
# Either current_dtype == output_dtype, or col_name not in
# output_dtypes - add it to outputs anyway.
outputs[col_name] = col_value
return outputs
Once again, thank you so much!
Your neat version of preprocessing_fn
has been adopted into my notebook.
It has enabled me to move a few more steps, but now I am struggeling with an error message related to column(s) having invalid shape:
Feature Fruit (Tensor("SparseToDense:0", shape=(None, None), dtype=string)) had invalid shape (None, None) for FixedLenFeature: apart from the batch dimension, all dimensions must have known size
PS: I have included warnings in my code in order to get more helpful debugging information along the way. PSPS: I see you have provided support to someone with a similar issue earlier, but none of the proposed fixes seem to work for my code.
The notebook is now uploaded and openly accessible at colab. It is simplified so that it generates all the data and module files it needs to run. I made a successful run with it by selecting "Connect to hosted runtime", and "Runtime" -> "Run All". https://colab.research.google.com/drive/1skLbDOCmf9ATmM1fB91nBQZvFIrKznqG?usp=sharing
Any help on this would be highly apreciated. Feel free to modify in my colab notebook - it should be setup for anyone to edit!
I can't seem to save changes in the colab, but I was able to fix the issue by replacing:
col_values = tf.sparse.to_dense(col_values)
with
col_values = tft.sparse_tensor_to_dense_with_shape(
col_values,
shape=(None, 1),
default_value='' if current_dtype is tf.string else 0)
Unfortunately, tf.sparse.to_dense still drops the shape information sometimes, so we have this helper method in TFT to retain it. With this change Transform finished successfully.
Thanks, the code now works! Is there already an active issue established to fix this in Tensorflow? My suggested fix would be that if
tf.sparse.to_dense(col_values)
gives the above error message, tensorflow could automatically call
col_values = tft.sparse_tensor_to_dense_with_shape(
col_values,
shape=(None, 1),
default_value='' if current_dtype is tf.string else 0)
rather than throwing the error message. Perhaps, they would still throw the message, but as a warning instead?
This is not possible due to the order of dependencies (TFT depends on TF, not the other way around). TF has a tracking issue for this and hopefully it will be fixed at some point, currently the workaround is to call tf.sparse.to_dense followed by calling set_shape() and the result with the expected shape (in your case - [None, 1]).
@tordbb Closing this issue as it is resolved,Please feel free to reopen if this still persists.Thanks
I am using the following test-code to try define the column
Stringed Ints
as strings. It contains ints, but I should be able to make tensorflow treat these as strings. When running test_tutorial_schema(), I get the errorE tensorflow.python.framework.errors_impl.InternalError: Unsupported object type int
. See full log at the bottom. Am I doing something wrong, or is there really no way to control how a data column should be interpreted by TFX?(The visible bottom part of) error message:
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:958:
self = <tensorflow.python.client.session.Session object at 0x7f6dfd0362d0> handle = None fetches = [<tf.Tensor 'transform/inputs/inputs/Intscopy:0' shape=(None,) dtype=float32>, <tf.Tensor 'transform/inputs/inputs/F...s_copy:0' shape=(None,) dtype=string>, <tf.Tensor 'transform/inputs/inputs/Strings_copy:0' shape=(None,) dtype=string>] feed_dict = {<tf.Tensor 'transform/inputs/Ints:0' shape=(None,) dtype=float32>: [1], <tf.Tensor 'transform/inputs/F_Stringed_Ints:0' shape=(None,) dtype=string>: [1], <tf.Tensor 'transform/inputs/Strings:0' shape=(None,) dtype=string>: ['hello']} options = None, run_metadata = None
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:1181:
self = <tensorflow.python.client.session.Session object at 0x7f6dfd0362d0> handle = None, target_list = [] fetch_list = [<tf.Tensor 'transform/inputs/inputs/Intscopy:0' shape=(None,) dtype=float32>, <tf.Tensor 'transform/inputs/inputs/F...s_copy:0' shape=(None,) dtype=string>, <tf.Tensor 'transform/inputs/inputs/Strings_copy:0' shape=(None,) dtype=string>] feed_dict = {<Reference wrapping <tf.Tensor 'transform/inputs/Ints:0' shape=(None,) dtype=float32>>: array([1.], dtype=float32), <...eference wrapping <tf.Tensor 'transform/inputs/Strings:0' shape=(None,) dtype=string>>: array(['hello'], dtype=object)} options = None, run_metadata = None
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:1359:
self = <tensorflow.python.client.session.Session object at 0x7f6dfd0362d0> fn = <function BaseSession._do_run.._run_fn at 0x7f6dfcfc5050>
args = ({<tensorflow.python._pywrap_tf_session.TF_Output object at 0x7f6dfd0369f0>: array([1.], dtype=float32), <tensorflow.p... object at 0x7f6dfd036ef0>, <tensorflow.python._pywrap_tf_session.TF_Output object at 0x7f6dfcfbe0b0>], [], None, None)
message = 'Unsupported object type int', m = None
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/client/session.py:1384: InternalError
During handling of the above exception, another exception occurred:
apache_beam/runners/common.py:961:
apache_beam/runners/common.py:726:
apache_beam/runners/common.py:807:
apache_beam/runners/common.py:1095:
self = <tensorflow_transform.beam.impl._RunMetaGraphDoFn object at 0x7f6dfd026110> batch = [{'Ints': 1, 'Stringed Ints': 1, 'Strings': 'hello'}] saved_model_dir = '/tmp/tmp6_61mrmc/tftransform_tmp/80c7f3af53944ba083f8cc99bb67f9d5'
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:422:
self = <tensorflow_transform.beam.impl._RunMetaGraphDoFn object at 0x7f6dfd026110> batch = [{'Ints': 1, 'Stringed Ints': 1, 'Strings': 'hello'}]
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:367: ValueError
During handling of the above exception, another exception occurred:
tests/unit/data_import/test_test.py:31:
../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py:567: in ror p.run().wait_until_finish() ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/pipeline.py:497: in run self._options).run(False) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/pipeline.py:510: in run return self.runner.run_pipeline(self, self._options) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py:130: in run_pipeline return runner.run_pipeline(pipeline, options) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:179: in run_pipeline pipeline.to_runner_api(default_environment=self._default_environment)) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:189: in run_via_runner_api return self.run_stages(stage_context, stages) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:335: in run_stages bundle_context_manager, ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:545: in _run_stage expected_timer_output) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:1051: in process_bundle for result, split_result in executor.map(execute, part_inputs): ../miniconda/envs/tfx/lib/python3.7/concurrent/futures/_base.py:598: in result_iterator yield fs.pop().result() ../miniconda/envs/tfx/lib/python3.7/concurrent/futures/_base.py:435: in result return self.get_result() ../miniconda/envs/tfx/lib/python3.7/concurrent/futures/_base.py:384: in get_result raise self._exception ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/utils/thread_pool_executor.py:44: in run self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs)) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:1048: in execute part_map, expected_outputs, fired_timers, expected_output_timers) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:947: in process_bundle result_future = self._worker_handler.control_conn.push(process_bundle_req) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py:349: in push response = self.worker.do_instruction(request) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py:471: in do_instruction getattr(request, request_type), request.instruction_id) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py:506: in process_bundle bundle_processor.process_bundle(instruction_id)) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py:972: in process_bundle element.data) ../miniconda/envs/tfx/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py:218: in process_encoded self.output(decoded_value) apache_beam/runners/worker/operations.py:330: in apache_beam.runners.worker.operations.Operation.output ??? apache_beam/runners/worker/operations.py:332: in apache_beam.runners.worker.operations.Operation.output ??? apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive ??? apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:1030: in apache_beam.runners.common.DoFnRunner._reraise_augmented ??? apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:553: in apache_beam.runners.common.SimpleInvoker.invoke_process ??? apache_beam/runners/common.py:1122: in apache_beam.runners.common._OutputProcessor.process_outputs ??? apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive ??? apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:1030: in apache_beam.runners.common.DoFnRunner._reraise_augmented ??? apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:553: in apache_beam.runners.common.SimpleInvoker.invoke_process ??? apache_beam/runners/common.py:1122: in apache_beam.runners.common._OutputProcessor.process_outputs ??? apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive ??? apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:1030: in apache_beam.runners.common.DoFnRunner._reraise_augmented ??? apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:553: in apache_beam.runners.common.SimpleInvoker.invoke_process ??? apache_beam/runners/common.py:1122: in apache_beam.runners.common._OutputProcessor.process_outputs ??? apache_beam/runners/worker/operations.py:195: in apache_beam.runners.worker.operations.SingletonConsumerSet.receive ??? apache_beam/runners/worker/operations.py:670: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/worker/operations.py:671: in apache_beam.runners.worker.operations.DoOperation.process ??? apache_beam/runners/common.py:963: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:1045: in apache_beam.runners.common.DoFnRunner._reraise_augmented ??? ../miniconda/envs/tfx/lib/python3.7/site-packages/future/utils/init.py:446: in raise_with_traceback raise exc.with_traceback(traceback) apache_beam/runners/common.py:961: in apache_beam.runners.common.DoFnRunner.process ??? apache_beam/runners/common.py:726: in apache_beam.runners.common.PerWindowInvoker.invoke_process ??? apache_beam/runners/common.py:807: in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window ??? apache_beam/runners/common.py:1095: in apache_beam.runners.common._OutputProcessor.process_outputs ??? ../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:422: in process yield self._handle_batch(batch)
self = <tensorflow_transform.beam.impl._RunMetaGraphDoFn object at 0x7f6dfd026110> batch = [{'Ints': 1, 'Stringed Ints': 1, 'Strings': 'hello'}]
../miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py:367: ValueError ------------------- Captured stderr call ------------------- WARNING:tensorflow:From /home/vagrant/tf-template/tests/unit/data_import/test_test.py:25: from_feature_spec (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version. Instructions for updating: from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec WARNING:tensorflow:Tensorflow version (2.3.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. WARNING:tensorflow:Tensorflow version (2.3.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 2020-08-19 06:37:54.207538: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-08-19 06:37:54.207574: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2020-08-19 06:37:54.207597: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vagrant): /proc/driver/nvidia/version does not exist 2020-08-19 06:37:54.207780: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-08-19 06:37:54.212731: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2592000000 Hz 2020-08-19 06:37:54.212936: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5562962cd0e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-08-19 06:37:54.212953: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version WARNING:tensorflow:From /home/vagrant/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /home/vagrant/miniconda/envs/tfx/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:220: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Use ref() instead. WARNING:tensorflow:Tensorflow version (2.3.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. ==================== 1 failed in 2.06s ====================