suphoff commented 5 years ago

When looking through the current image operations and how to add functionality I quickly came to imagine lots and lots of per file format special operations - and did not really like that picture.

Currently all operations on image and video files (on file system or as strings) parse the files, extract one (or more) pictures as 2-3 dimensional data tensors and then throw away all parsing information.

This severely restricts the usability - all possible operations would have to be defined per file format - and each operation starts from scratch from an unopened file.

If ImageSets (Single Pictures, Multiple Pictures (Tiff), Videos (Single or multi channel)) would be wrapped in a DT_VARIANT they could become first class objects in TF.

ImageSet Operations could then extract information from the ImageSet like cardinality, specific images, image sizes, ...

Example usage pattern in pseudo TF code:

# Extract Random Image from TIF file
image_set = TiffImageSet('example.tif')
n = ImageSetCardinality(image_set)
index = tf.random.uniform(1, minval = 0; maxval = n, dtype=tf.dtypes.int64)
image = ImageSetGetImage(image_set, index)

A similar argument could be made to represent single potential images as an ImageSource Object. This way meta data (focal length, resolution, ...) information could be retrieved from the ImageSource Object using special ImageSource Operations, used in calculations and then used as parameters to retrieve a cropped and scaled subset with another ImageSource Operation.

Example usage:

image_source = Picture('car.jpg');
 # Extract Low RES picture from image_source and find license plate
license_plate_area = find_license_plate(image_source)
# Extract high Resolution patch from image_source and read license  
plate_image = ImageSourceExtractPatch(image_source, license_plate_area)
tag = read_tag(plate_image)

I believe ImageSource operations could also be made batch friendly to avoid unnecessary copy operations on batching.

This may not be a realistic idea - but I wanted to bring it up as this may be the ideal time to specify a new generic interface. (There are enough existing formats in the repository to verify interface, but not too many to make the task unmanageable.)

Thoughts?

yongtang commented 5 years ago

@suphoff These are definitely good ideas! May still need to see if these could be applied into TF's graph mode but it should work in eager mode I think.

The metadata information of those images (either as a standalone image, or as one frame within Video/TIFF, or maybe GIF animations) are very useful and could help extract images of interests. Maybe we could start building some individual ops to see how it could be applied?

suphoff commented 5 years ago

@yongtang OK - let me prototype something next week and we can then just play around with it. Should work just fine in graph mode.

suphoff commented 5 years ago

Wrapping reference counted C++ objects in variants and passing them around in the TF's graph works just fine.

$python
Python 3.6.6 (default, Sep 12 2018, 18:26:19) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> import tensorflow_io.image as image
>>> y = image.WebPImageSourceFromFile("tests/test_image/sample.webp")
>>> print (y)
Tensor("WebPImageSourceFromFile:0", shape=(), dtype=variant)
>>> s = image.ImageSourceToDebug(y)
>>> t =  image.ImageSourceToDebug(y)
>>> with tf.Session() as sess:
...    sess.run((s,t))
... 
2019-02-25 20:18:17.438619: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-25 20:18:17.463571: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3393300000 Hz
2019-02-25 20:18:17.465653: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2b7e630 executing computations on platform Host. Devices:
2019-02-25 20:18:17.465681: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
(b'WebPImageSource140640784485360', b'WebPImageSource140640784485360')

y = image.WebPImageSourceFromFile("tests/test_image/sample.webp") - creates a new object of class WebPImageSource (derived from ImageSource) and returns it as variant Tensor.
s = image.ImageSourceToDebug(y) - extracts a pointer to an ImageSource object from the input variant tensor and calls a virtual function to return a debug string.

Source for the ImageSourceToDebug op below:

class ImageSourceToDebugOp: public OpKernel {
 public:
  explicit ImageSourceToDebugOp(OpKernelConstruction* context) : OpKernel(context) {}

  void Compute(OpKernelContext* ctx) override {
    const Tensor& variant_tensor = ctx->input(0);
    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(variant_tensor.shape()),
                errors::InvalidArgument("contents must be scalar, got shape ",
                                        variant_tensor.shape().DebugString()));

    ImageSource* image = nullptr;
    OP_REQUIRES_OK(ctx,ImageSource::FromVariantTensor(variant_tensor, &image));

    Tensor* output;
    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &output));

    output->scalar<string>()() = image->DebugString();

  }
};

and WebPImageSource:

class WebPImageSource: public ImageSource {
...
  string DebugString() override { return string("WebPImageSource") + std::to_string((unsigned long long) (this)) ;}
...
};

TODO:

Test in Eager Mode
(Optional) Find a way to store variant Tensors persistently across runs (resource wrappers?, variables?)
Implement a few real ImageSource Operations
Some cleanup ...

Suggestions for a few sample ImageSource operations signatures (APIs) are welcome.

suphoff commented 5 years ago

Works in Eager Mode :heavy_check_mark:
Variant Tensors can be persistent (or Eager Mode would not work I guess) - However still searching for a clean way to bypass all the nanny code around variables. May still end up writing ops for this.
Will pull in code from existing image code tensorflow_io and tensorflow decode_image_op.c and wrap it in this framework to have at least png, jpg and webP support out of the box. May take some time to fine tune ops, attributes and input parameters for clean internal and external interfaces.

yongtang commented 5 years ago

I would be in favor of dropping FromFile (e.g, WebPImageSourceFromFile -> WebPImageSource), we could use either protocol (like file:// s3://), or hint as the args if needed.

suphoff commented 5 years ago

Maybe just add an attribute to the OP to determine input(0) usage ?

decode_image_op.c contains some detection logic for file formats that could be useful to implement a generic ImageSource.

yongtang commented 5 years ago

@suphoff Yes most of the file format consists of a magic string at the beginning and it is pretty easy to detect the file format with small read ahead.

suphoff commented 5 years ago

Variant Tensors work just fine with variables :heavy_check_mark:

... it really helps to keep the session open during variable testing :joy_cat:

yongtang commented 5 years ago

@suphoff I have been playing with variant and resource for the past several weeks. I think in general, I agree with you that variant is more likely to be a fit. Variant is best suited for immutable data since you can always referencing it. The resource looks to be suited for writer, and, mutable (or unable to replay) data. Most of the read data are immutable, even Kafka consumer are immutable technically as you could set the offset optionally. There are some scenarios where resource might be needed. Writer is obvious one as resource essentially like a file handle or file descriptor so that the order could be maintained.

The challenge here is that we want to fit the variant to tf.data.Dataset as tf.data.Dataset is needed to allow the data to be trained with tf.keras. I think I am almost there. I will create PR very soon.

tensorflow / io

Idea: DT_VARIANT type for ImageSets / ImageSource? #109

TODO: