tensorflow / transform

Input pipeline framework
Apache License 2.0
985 stars 215 forks source link

TFX Transform component receives "Error 413 (Request Entity Too Large)!!1" from Dataflow #242

Open mbernico opened 3 years ago

mbernico commented 3 years ago

Creating a TFX pipeline for a structure data model with 1621 features, I receive this error from TFX 0.30.0/TensorflowTransform 0.30.0:

ERROR:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'tfx_util@gs://redacted/_wheels/tfx_user_code_Transform-0.0+9f052e692cc2c8a7d7411a095329ab307d215d22c7010cda7474824c1988ccc9-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
WARNING:tensorflow:From /home/jupyter/.local/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:From /home/jupyter/.local/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.runners.portability.stager:The .whl package "/tmp/tmpqx90xuwj/tfx_user_code_Transform-0.0+9f052e692cc2c8a7d7411a095329ab307d215d22c7010cda7474824c1988ccc9-py3-none-any.whl" is provided in --extra_package. This functionality is not officially supported. Since wheel packages are binary distributions, this package must be binary-compatible with the worker environment (e.g. Python 2.7 running on an x64 Linux host).
WARNING:apache_beam.runners.portability.stager:The .whl package "/tmp/tmph8oewj3m/tfx_user_code_Transform-0.0+9f052e692cc2c8a7d7411a095329ab307d215d22c7010cda7474824c1988ccc9-py3-none-any.whl" is provided in --extra_package. This functionality is not officially supported. Since wheel packages are binary distributions, this package must be binary-compatible with the worker environment (e.g. Python 2.7 running on an x64 Linux host).
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 3.0584546420241447 seconds before retrying submit_job_description because we caught exception: BrokenPipeError: [Errno 32] Broken pipe
 Traceback for above exception (most recent call last):
  File "/home/jupyter/.local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
    return fun(*args, **kwargs)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 785, in submit_job_description
    response = self._client.projects_locations_jobs.Create(request)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 903, in Create
    config, request, global_params=global_params)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 729, in _RunMethod
    http, http_request, **opts)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apitools/base/py/http_wrapper.py", line 350, in MakeRequest
    check_response_func=check_response_func)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apitools/base/py/http_wrapper.py", line 400, in _MakeRequestNoRetry
    redirections=redirections, connection_type=connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/opt/conda/lib/python3.7/site-packages/httplib2/__init__.py", line 1709, in request
    conn, authority, uri, request_uri, method, body, headers, redirections, cachekey,
  File "/opt/conda/lib/python3.7/site-packages/httplib2/__init__.py", line 1424, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/opt/conda/lib/python3.7/site-packages/httplib2/__init__.py", line 1347, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/opt/conda/lib/python3.7/http/client.py", line 1277, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1323, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1272, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1071, in _send_output
    self.send(chunk)
  File "/opt/conda/lib/python3.7/http/client.py", line 993, in send
    self.sock.sendall(data)
  File "/opt/conda/lib/python3.7/ssl.py", line 1034, in sendall
    v = self.send(byte_view[count:])
  File "/opt/conda/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)

---------------------------------------------------------------------------
HttpError                                 Traceback (most recent call last)
<ipython-input-39-efea3de47a8e> in <module>
----> 1 context.run(transform)

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run_if_ipython(*args, **kwargs)
     66       # __IPYTHON__ variable is set by IPython, see
     67       # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython.
---> 68       return fn(*args, **kwargs)
     69     else:
     70       absl.logging.warning(

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
    186         telemetry_utils.LABEL_TFX_RUNNER: runner_label,
    187     }):
--> 188       execution_id = launcher.launch().execution_id
    189 
    190     return execution_result.ExecutionResult(

~/.local/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py in launch(self)
    207                          copy.deepcopy(execution_decision.input_dict),
    208                          execution_decision.output_dict,
--> 209                          copy.deepcopy(execution_decision.exec_properties))
    210 
    211     absl.logging.info('Running publisher for %s',

~/.local/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
     70     # output_dict can still be changed, specifically properties.
     71     executor.Do(
---> 72         copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))

~/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py in Do(self, input_dict, output_dict, exec_properties)
    490       label_outputs[labels.CACHE_OUTPUT_PATH_LABEL] = cache_output
    491     status_file = 'status_file'  # Unused
--> 492     self.Transform(label_inputs, label_outputs, status_file)
    493     absl.logging.debug('Cleaning up temp path %s on executor success',
    494                        temp_path)

~/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py in Transform(***failed resolving arguments***)
   1025                       output_cache_dir, compute_statistics,
   1026                       per_set_stats_output_paths, materialization_format,
-> 1027                       len(analyze_data_paths))
   1028   # TODO(b/122478841): Writes status to status file.
   1029 

~/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py in _RunBeamImpl(self, analyze_data_list, transform_data_list, preprocessing_fn, stats_options_updater_fn, force_tf_compat_v1, input_dataset_metadata, transform_output_path, raw_examples_data_format, temp_path, input_cache_dir, output_cache_dir, compute_statistics, per_set_stats_output_paths, materialization_format, analyze_paths_count)
   1338                      Executor._RecordBatchToExamples)
   1339                  | 'Materialize[{}]'.format(infix) >> self._WriteExamples(
-> 1340                      materialization_format, dataset.materialize_output_path))
   1341 
   1342     return _Status.OK()

~/.local/lib/python3.7/site-packages/apache_beam/pipeline.py in __exit__(self, exc_type, exc_val, exc_tb)
    583     try:
    584       if not exc_type:
--> 585         self.result = self.run()
    586         self.result.wait_until_finish()
    587     finally:

~/.local/lib/python3.7/site-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    538             self.to_runner_api(use_fake_coders=True),
    539             self.runner,
--> 540             self._options).run(False)
    541 
    542       if (self._options.view_as(TypeOptions).runtime_type_check and

~/.local/lib/python3.7/site-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    562         finally:
    563           shutil.rmtree(tmpdir)
--> 564       return self.runner.run_pipeline(self, self._options)
    565     finally:
    566       shutil.rmtree(self.local_tempdir, ignore_errors=True)

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py in run_pipeline(self, pipeline, options)
    580     # raise an exception.
    581     result = DataflowPipelineResult(
--> 582         self.dataflow_client.create_job(self.job), self)
    583 
    584     # TODO(BEAM-4274): Circular import runners-metrics. Requires refactoring.

~/.local/lib/python3.7/site-packages/apache_beam/utils/retry.py in wrapper(*args, **kwargs)
    251       while True:
    252         try:
--> 253           return fun(*args, **kwargs)
    254         except Exception as exn:  # pylint: disable=broad-except
    255           if not retry_filter(exn):

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py in create_job(self, job)
    682 
    683     if not template_location:
--> 684       return self.submit_job_description(job)
    685 
    686     _LOGGER.info(

~/.local/lib/python3.7/site-packages/apache_beam/utils/retry.py in wrapper(*args, **kwargs)
    251       while True:
    252         try:
--> 253           return fun(*args, **kwargs)
    254         except Exception as exn:  # pylint: disable=broad-except
    255           if not retry_filter(exn):

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py in submit_job_description(self, job)
    783 
    784     try:
--> 785       response = self._client.projects_locations_jobs.Create(request)
    786     except exceptions.BadStatusCodeError as e:
    787       _LOGGER.error(

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py in Create(self, request, global_params)
    901       config = self.GetMethodConfig('Create')
    902       return self._RunMethod(
--> 903           config, request, global_params=global_params)
    904 
    905     Create.method_config = lambda: base_api.ApiMethodInfo(

~/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py in _RunMethod(self, method_config, request, global_params, upload, upload_config, download)
    729                 http, http_request, **opts)
    730 
--> 731         return self.ProcessHttpResponse(method_config, http_response, request)
    732 
    733     def ProcessHttpResponse(self, method_config, http_response, request=None):

~/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py in ProcessHttpResponse(self, method_config, http_response, request)
    735         return self.__client.ProcessResponse(
    736             method_config,
--> 737             self.__ProcessHttpResponse(method_config, http_response, request))

~/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py in __ProcessHttpResponse(self, method_config, http_response, request)
    602                                              http_client.NO_CONTENT):
    603             raise exceptions.HttpError.FromResponse(
--> 604                 http_response, method_config=method_config, request=request)
    605         if http_response.status_code == http_client.NO_CONTENT:
    606             # TODO(craigcitro): Find out why _replace doesn't seem to work

HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/redacted-dev-datascience/locations/us-central1/jobs?alt=json>: response: <{'content-type': 'text/html; charset=UTF-8', 'referrer-policy': 'no-referrer', 'content-length': '2477', 'date': 'Wed, 23 Jun 2021 19:48:48 GMT', 'connection': 'close', 'status': '413'}>, content <<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 413 (Request Entity Too Large)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>413.</b> <ins>That���s an error.</ins>
  <p>Your client issued a request that was too large.
 <script>
  (function() { /*

 Copyright The Closure Library Authors.
 SPDX-License-Identifier: Apache-2.0
*/
var c=function(a,d,b){a=a+"=deleted; path="+d;null!=b&&(a+="; domain="+b);document.cookie=a+"; expires=Thu, 01 Jan 1970 00:00:00 GMT"};var g=function(a){var d=e,b=location.hostname;c(d,a,null);c(d,a,b);for(var f=0;;){f=b.indexOf(".",f+1);if(0>f)break;c(d,a,b.substring(f+1))}};var h;if(4E3<unescape(encodeURI(document.cookie)).length){for(var k=document.cookie.split(";"),l=[],m=0;m<k.length;m++){var n=k[m].match(/^\s*([^=]+)/);n&&l.push(n[1])}for(var p=0;p<l.length;p++){var e=l[p];g("/");for(var q=location.pathname,r=0;;){r=q.indexOf("/",r+1);if(0>r)break;var t=q.substring(0,r);g(t);g(t+"/")}"/"!=q.charAt(q.length-1)&&(g(q),g(q+"/"))}h=!0}else h=!1;
h&&setTimeout(function(){if(history.replaceState){var a=location.href;history.replaceState(null,"","/");location.replace(a)}},1E3); })();

</script>
 <ins>That���s all we know.</ins>

InteractiveContext is the orchestrator and each component is running on Cloud Dataflow.

TFX preprocessing_fn is:

def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.
  Args:
    inputs: map from feature keys to raw not-yet-transformed features.
  Returns:
    Map from string feature key to transformed feature operations.
  """
  features = get_keys()
  absl.logging.debug(inputs.keys)

  outputs = {}

  for key in features['continuous']:
    outputs[key] = tft.scale_to_z_score(_convert_to_dense(inputs[key]))

  for key in features['vocab']:
    outputs[key] = tft.compute_and_apply_vocabulary(
        _convert_to_dense(inputs[key]),
        top_k=MAX_VOCAB_SIZE,
        num_oov_buckets=OOV_SIZE,
        vocab_filename=key)

  for key in features['identity']:
    outputs[key] = _convert_to_dense(inputs[key])

  return outputs
zhitaoli commented 3 years ago

Hi @mbernico

I see that you are using DataflowRunner for this component. Can you share the beam_pipeline_args used (might be either on pipeline level or component level)?

Also, can you try to add --experiments=upload_graph to the beam_pipeline_args and let us know whether the issue would disappear?

arghyaganguly commented 3 years ago

@mbernico , please update on @zhitaoli's comment.