tensorflow / fairness-indicators

Tensorflow's Fairness Evaluation and Visualization Toolkit
Apache License 2.0
343 stars 80 forks source link

AttributeError in Facessd Fairness Indicators Example Colab.ipynb #71

Closed AlessandroPierro closed 4 years ago

AlessandroPierro commented 4 years ago

System information

Describe the current behavior

Getting the following error when running cell 3 line 2:

AttributeError: module 'tfx_bsl.coders.example_coder' has no attribute 'ExamplesToRecordBatchDecoder' [while running 'DecodeData/BatchSerializedExamplesToArrowTables/BatchDecodeExamples']

Standalone code to reproduce the issue The error is easy reproduced running the "Facessd Fairness Indicators Example Colab.ipynb" on Colab.

Other info / logs

IndexError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/sdk_worker.py in get(self, instruction_id, bundle_descriptor_id)
    311       # pop() is threadsafe
--> 312       processor = self.cached_bundle_processors[bundle_descriptor_id].pop()
    313     except IndexError:

IndexError: pop from empty list

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnRunner._invoke_lifecycle_method()

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnInvoker.invoke_setup()

/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/utils/batch_util.py in setup(self)
    106   def setup(self):
--> 107     self._decoder = example_coder.ExamplesToRecordBatchDecoder()
    108 

AttributeError: module 'tfx_bsl.coders.example_coder' has no attribute 'ExamplesToRecordBatchDecoder'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-31ccd38caa04> in <module>()
      1 data_location = tf.keras.utils.get_file('lfw_dataset.tf', 'https://storage.googleapis.com/facessd_dataset/lfw_dataset.tfrecord')
      2 
----> 3 stats = tfdv.generate_statistics_from_tfrecord(data_location=data_location)
      4 tfdv.visualize_statistics(stats)

/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/utils/stats_gen_lib.py in generate_statistics_from_tfrecord(data_location, output_path, stats_options, pipeline_options, compression_type)
    113             shard_name_template='',
    114             coder=beam.coders.ProtoCoder(
--> 115                 statistics_pb2.DatasetFeatureStatisticsList)))
    116   return load_statistics(output_path)
    117 

/usr/local/lib/python3.6/dist-packages/apache_beam/pipeline.py in __exit__(self, exc_type, exc_val, exc_tb)
    501   def __exit__(self, exc_type, exc_val, exc_tb):
    502     if not exc_type:
--> 503       self.run().wait_until_finish()
    504 
    505   def visit(self, visitor):

/usr/local/lib/python3.6/dist-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    481       return Pipeline.from_runner_api(
    482           self.to_runner_api(use_fake_coders=True), self.runner,
--> 483           self._options).run(False)
    484 
    485     if self._options.view_as(TypeOptions).runtime_type_check:

/usr/local/lib/python3.6/dist-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    494       finally:
    495         shutil.rmtree(tmpdir)
--> 496     return self.runner.run_pipeline(self, self._options)
    497 
    498   def __enter__(self):

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/direct/direct_runner.py in run_pipeline(self, pipeline, options)
    128       runner = BundleBasedDirectRunner()
    129 
--> 130     return runner.run_pipeline(pipeline, options)
    131 
    132 

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in run_pipeline(self, pipeline, options)
    553 
    554     self._latest_run_result = self.run_via_runner_api(
--> 555         pipeline.to_runner_api(default_environment=self._default_environment))
    556     return self._latest_run_result
    557 

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in run_via_runner_api(self, pipeline_proto)
    563     # TODO(pabloem, BEAM-7514): Create a watermark manager (that has access to
    564     #   the teststream (if any), and all the stages).
--> 565     return self.run_stages(stage_context, stages)
    566 
    567   @contextlib.contextmanager

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in run_stages(self, stage_context, stages)
    704               stage,
    705               pcoll_buffers,
--> 706               stage_context.safe_coders)
    707           metrics_by_stage[stage.name] = stage_results.process_bundle.metrics
    708           monitoring_infos_by_stage[stage.name] = (

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in _run_stage(self, worker_handler_factory, pipeline_components, stage, pcoll_buffers, safe_coders)
   1071         cache_token_generator=cache_token_generator)
   1072 
-> 1073     result, splits = bundle_manager.process_bundle(data_input, data_output)
   1074 
   1075     def input_for(transform_id, input_id):

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in process_bundle(self, inputs, expected_outputs)
   2332 
   2333     with UnboundedThreadPoolExecutor() as executor:
-> 2334       for result, split_result in executor.map(execute, part_inputs):
   2335 
   2336         split_result_list += split_result

/usr/lib/python3.6/concurrent/futures/_base.py in result_iterator()
    584                     # Careful not to keep a reference to the popped future
    585                     if timeout is None:
--> 586                         yield fs.pop().result()
    587                     else:
    588                         yield fs.pop().result(end_time - time.monotonic())

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

/usr/local/lib/python3.6/dist-packages/apache_beam/utils/thread_pool_executor.py in run(self)
     42       # If the future wasn't cancelled, then attempt to execute it.
     43       try:
---> 44         self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
     45       except BaseException as exc:
     46         # Even though Python 2 futures library has #set_exection(),

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in execute(part_map)
   2329           self._registered,
   2330           cache_token_generator=self._cache_token_generator)
-> 2331       return bundle_manager.process_bundle(part_map, expected_outputs)
   2332 
   2333     with UnboundedThreadPoolExecutor() as executor:

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in process_bundle(self, inputs, expected_outputs)
   2243             process_bundle_descriptor_id=self._bundle_descriptor.id,
   2244             cache_tokens=[next(self._cache_token_generator)]))
-> 2245     result_future = self._worker_handler.control_conn.push(process_bundle_req)
   2246 
   2247     split_results = []  # type: List[beam_fn_api_pb2.ProcessBundleSplitResponse]

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/portability/fn_api_runner.py in push(self, request)
   1557       self._uid_counter += 1
   1558       request.instruction_id = 'control_%s' % self._uid_counter
-> 1559     response = self.worker.do_instruction(request)
   1560     return ControlFuture(request.instruction_id, response)
   1561 

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/sdk_worker.py in do_instruction(self, request)
    413       # E.g. if register is set, this will call self.register(request.register))
    414       return getattr(self, request_type)(
--> 415           getattr(request, request_type), request.instruction_id)
    416     else:
    417       raise NotImplementedError

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/sdk_worker.py in process_bundle(self, request, instruction_id)
    442     # type: (...) -> beam_fn_api_pb2.InstructionResponse
    443     bundle_processor = self.bundle_processor_cache.get(
--> 444         instruction_id, request.process_bundle_descriptor_id)
    445     try:
    446       with bundle_processor.state_handler.process_instruction_id(

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/sdk_worker.py in get(self, instruction_id, bundle_descriptor_id)
    316           self.state_handler_factory.create_state_handler(
    317               self.fns[bundle_descriptor_id].state_api_service_descriptor),
--> 318           self.data_channel_factory)
    319     self.active_bundle_processors[
    320         instruction_id] = bundle_descriptor_id, processor

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/bundle_processor.py in __init__(self, process_bundle_descriptor, state_handler, data_channel_factory)
    741     self.ops = self.create_execution_tree(self.process_bundle_descriptor)
    742     for op in self.ops.values():
--> 743       op.setup()
    744     self.splitting_lock = threading.Lock()
    745 

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/operations.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.worker.operations.DoOperation.setup()

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/worker/operations.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.worker.operations.DoOperation.setup()

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnRunner.setup()

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnRunner._invoke_lifecycle_method()

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnRunner._reraise_augmented()

/usr/local/lib/python3.6/dist-packages/future/utils/__init__.py in raise_with_traceback(exc, traceback)
    417         if traceback == Ellipsis:
    418             _, _, traceback = sys.exc_info()
--> 419         raise exc.with_traceback(traceback)
    420 
    421 else:

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnRunner._invoke_lifecycle_method()

/usr/local/lib/python3.6/dist-packages/apache_beam/runners/common.cpython-36m-x86_64-linux-gnu.so in apache_beam.runners.common.DoFnInvoker.invoke_setup()

/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/utils/batch_util.py in setup(self)
    105 
    106   def setup(self):
--> 107     self._decoder = example_coder.ExamplesToRecordBatchDecoder()
    108 
    109   def process(self, batch: List[bytes]) -> Iterable[pa.Table]:

AttributeError: module 'tfx_bsl.coders.example_coder' has no attribute 'ExamplesToRecordBatchDecoder' [while running 'DecodeData/BatchSerializedExamplesToArrowTables/BatchDecodeExamples']
fhuanming commented 4 years ago

Hi, thank you for your feedback.

This issue should be caused by incompatible version between the tfdv and tfx-bsl. This should be already fixed. Would you mind rerunning the colab to see if the issue has gone?

Thanks!

kshivvy commented 4 years ago

I'm also having this issue running a different notebook on a docker image, but both my tfdv and tfx-bsl versions are 0.22.0. Is there a specific version that they should be?

fhuanming commented 4 years ago

Hi kshivvy,

Based this table, the version 0.22.0 of tfdv and tfx-bsl are compatible with each other. I just did a test on same notebook you shared above and it works for me.

Did you install the tfdv and tfx-bsl packages inside the notebook? If so, you might need to restart the notebook to re-import the installed packages.