tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.29k stars 1.54k forks source link

[GSoC] Add a dummy beam wrapper #1946

Open Conchylicultor opened 4 years ago

Conchylicultor commented 4 years ago

Currently, implementing a beam dataset is quire painful due to lazy imports, which makes it impossible to use beam.DoFn,... in the global scope. It would be nice to be able to implement beam datasets without worrying about declaring beam methods only in the main scope.

By replacing the beam module by a dummy no-op module, it would not crash when importing tensorflow datasets. It would greatly improve the usability:

try:
  import appache_beam as beam
except ImportError:
  beam = DummyBeam

class SomeFn(beam.DoFn):
  pass

@beam.ptransform_fn
@beam.typehints.with_input_types(beam.Pipeline)
@beam.typehints.with_output_types(Union[tf.train.Example, bytes])
def some_ptransform_fn():
  pass

Once https://github.com/tensorflow/datasets/issues/1945 is done, the try/except could be replaced by:

with tfds.core.lazy_imports():
  import appache_beam as beam

class SomeFn(beam.DoFn):
  pass

@beam.ptransform_fn
def some_ptransform_fn():
  pass

Note: Not all the beam API has to be implemented. New methods can be added on the go as they are needed.

The implementation could go into /core/utils/dummy_beam.py, and tests in /core/utils/dummy_beam_test.py.

Eshan-Agarwal commented 4 years ago

Working on it, can you please help me with few things which I don't get :

We have to implement DummyBeam class in dummy_beam.py which can be used as described in code above? or is above code is hint for implementing DummyBeam ? or can be used to test after implementing DummyBeam?.

Sorry for trouble.

Conchylicultor commented 4 years ago

We have to implement DummyBeam class in dummy_beam.py

Yes. The code snippet above is an example of DummyBeam usage.