Open suphoff opened 5 years ago
@suphoff These are definitely good ideas! May still need to see if these could be applied into TF's graph mode but it should work in eager mode I think.
The metadata information of those images (either as a standalone image, or as one frame within Video/TIFF, or maybe GIF animations) are very useful and could help extract images of interests. Maybe we could start building some individual ops to see how it could be applied?
@yongtang OK - let me prototype something next week and we can then just play around with it. Should work just fine in graph mode.
Wrapping reference counted C++ objects in variants and passing them around in the TF's graph works just fine.
$python
Python 3.6.6 (default, Sep 12 2018, 18:26:19)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> import tensorflow_io.image as image
>>> y = image.WebPImageSourceFromFile("tests/test_image/sample.webp")
>>> print (y)
Tensor("WebPImageSourceFromFile:0", shape=(), dtype=variant)
>>> s = image.ImageSourceToDebug(y)
>>> t = image.ImageSourceToDebug(y)
>>> with tf.Session() as sess:
... sess.run((s,t))
...
2019-02-25 20:18:17.438619: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-25 20:18:17.463571: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3393300000 Hz
2019-02-25 20:18:17.465653: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2b7e630 executing computations on platform Host. Devices:
2019-02-25 20:18:17.465681: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
(b'WebPImageSource140640784485360', b'WebPImageSource140640784485360')
y = image.WebPImageSourceFromFile("tests/test_image/sample.webp") - creates a new object of class WebPImageSource (derived from ImageSource) and returns it as variant Tensor.
s = image.ImageSourceToDebug(y) - extracts a pointer to an ImageSource object from the input variant tensor and calls a virtual function to return a debug string.
Source for the ImageSourceToDebug op below:
class ImageSourceToDebugOp: public OpKernel {
public:
explicit ImageSourceToDebugOp(OpKernelConstruction* context) : OpKernel(context) {}
void Compute(OpKernelContext* ctx) override {
const Tensor& variant_tensor = ctx->input(0);
OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(variant_tensor.shape()),
errors::InvalidArgument("contents must be scalar, got shape ",
variant_tensor.shape().DebugString()));
ImageSource* image = nullptr;
OP_REQUIRES_OK(ctx,ImageSource::FromVariantTensor(variant_tensor, &image));
Tensor* output;
OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &output));
output->scalar<string>()() = image->DebugString();
}
};
and WebPImageSource:
class WebPImageSource: public ImageSource {
...
string DebugString() override { return string("WebPImageSource") + std::to_string((unsigned long long) (this)) ;}
...
};
Suggestions for a few sample ImageSource operations signatures (APIs) are welcome.
I would be in favor of dropping FromFile
(e.g, WebPImageSourceFromFile
-> WebPImageSource
), we could use either protocol (like file://
s3://
), or hint as the args if needed.
Maybe just add an attribute to the OP to determine input(0) usage ?
decode_image_op.c contains some detection logic for file formats that could be useful to implement a generic ImageSource.
@suphoff Yes most of the file format consists of a magic string at the beginning and it is pretty easy to detect the file format with small read ahead.
... it really helps to keep the session open during variable testing :joy_cat:
@suphoff I have been playing with variant and resource for the past several weeks. I think in general, I agree with you that variant is more likely to be a fit. Variant is best suited for immutable data since you can always referencing it. The resource looks to be suited for writer, and, mutable (or unable to replay) data. Most of the read data are immutable, even Kafka consumer are immutable technically as you could set the offset optionally. There are some scenarios where resource might be needed. Writer is obvious one as resource essentially like a file handle or file descriptor so that the order could be maintained.
The challenge here is that we want to fit the variant to tf.data.Dataset as tf.data.Dataset is needed to allow the data to be trained with tf.keras. I think I am almost there. I will create PR very soon.
When looking through the current image operations and how to add functionality I quickly came to imagine lots and lots of per file format special operations - and did not really like that picture.
Currently all operations on image and video files (on file system or as strings) parse the files, extract one (or more) pictures as 2-3 dimensional data tensors and then throw away all parsing information.
This severely restricts the usability - all possible operations would have to be defined per file format - and each operation starts from scratch from an unopened file.
If ImageSets (Single Pictures, Multiple Pictures (Tiff), Videos (Single or multi channel)) would be wrapped in a DT_VARIANT they could become first class objects in TF.
ImageSet Operations could then extract information from the ImageSet like cardinality, specific images, image sizes, ...
Example usage pattern in pseudo TF code:
A similar argument could be made to represent single potential images as an ImageSource Object. This way meta data (focal length, resolution, ...) information could be retrieved from the ImageSource Object using special ImageSource Operations, used in calculations and then used as parameters to retrieve a cropped and scaled subset with another ImageSource Operation.
Example usage:
I believe ImageSource operations could also be made batch friendly to avoid unnecessary copy operations on batching.
This may not be a realistic idea - but I wanted to bring it up as this may be the ideal time to specify a new generic interface. (There are enough existing formats in the repository to verify interface, but not too many to make the task unmanageable.)
Thoughts?