tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.76k stars 74.21k forks source link

[TF 2.0] allow tf.function input_signature to be specified by annotations #31579

Closed jeffpollock9 closed 6 months ago

jeffpollock9 commented 5 years ago

System information

Describe the feature and the current behavior/state.

tf.function has an argument input_signature which I have been using to try and make my code a bit safer and ensure I don't keep re-tracing functions. The input_signature specifies the tensor type for each of the function arguments. It would be much nicer (I think) to specify these types using python (>=3.5) annotations, where a suitable version of python is available. A very rough example looks like:

import tensorflow as tf

def function(fn):
    input_signature = list(fn.__annotations__.values())
    return tf.function(fn, autograph=False, input_signature=input_signature)

@function
def foo(
    x: tf.TensorSpec(shape=[None], dtype=tf.float64),
    y: tf.TensorSpec(shape=[None], dtype=tf.float64),
):
    return x + 10.0 + y

vec32 = tf.random.normal([2], dtype=tf.float32)
vec64 = tf.random.normal([2], dtype=tf.float64)

# should pass
foo(vec64, vec64)
foo(y=vec64, x=vec64)

# should fail
foo(vec32, vec64)

Which I think is nicer than the current signature:

@tf.function(
    autograph=False,
    input_signature=[
        tf.TensorSpec(shape=[None], dtype=tf.float64),
        tf.TensorSpec(shape=[None], dtype=tf.float64),
    ],
)
def foo(x, y):
    return x + 10.0 + y

I think the main benefit of the annotation approach is that the argument name and type are beside each other, and this syntax is already widely used in python.

In order to enable using annotations as the input_signature I think there should be an extra boolean argument to tf.function called e.g. use_annotation_input_signature which defaults to False.

Also note I have set autograph=False here to avoid a warning:

Cause: name 'foo_scope' is not defined

I am guessing a proper implementation inside of tf.function would not have this problem.

Will this change the current api? How?

It would add an additional argument to tf.function which at the default value would not change anything.

Who will benefit with this feature?

Anyone using python >= 3.5 who would like to specify the tensor types of their functions.

Any Other info.

None

bionicles commented 5 years ago

the concept of "None" == "Anything" is dumb and reflects poorly on Tensorflow and Keras team + users

kkimdev commented 5 years ago

Hi, I like the general idea of leveraging more standard Python features. Though one property we want to keep for @tf.function is that preserving the same semantics with or without @tf.function, and since we're not passing tf.TensorSpec instance as an argument, annotating the arguments to be tf.TensorSpec won't be strictly correct.

jeffpollock9 commented 5 years ago

Hi @kkimdev. Yes I realised that TensorSpec wasn't the right type shortly after writing this and agree it is weird/confusing to annotate the arguments with the wrong type. I can't think of a solution which would work for this now, since the "spec" (e.g. can choose any shape based on runtime arguments) is dynamic and the annotations are really meant to be static so that mypy etc can do static analysis.

Will close this later unless anyone has any ideas on how it could work properly.

kkimdev commented 5 years ago

Closing as it has been 2 weeks.

danmou commented 4 years ago

Why not make it possible to use tf.Tensor[shape, dtype] as an annotation? This would also be useful in general to document the input requirements.

mdanatg commented 4 years ago

Yes, that would be the ideal format. I think you can use tf.Tensor (untemplated) right now in your regular type annotations, but implementing it as a generic type that captures both dtype and shape would take a bit more programming in the definition of the Tensor classes.

jeffpollock9 commented 4 years ago

@Danmou yes that's a good idea - not sure why I didn't do that in the first place. I'll re-open this and try to have a go at implementing it but if anyone has any ideas/would like to help that'd be great.

EDIT: I can't re-open this, @kkimdev if you think this would be useful at all can you re-open please?

jeffpollock9 commented 4 years ago

I've tried to knock up some code to try and figure out how this might work but am finding it quite hard (mainly due to my lack of experience with typing). If there is any feedback on this it would be great:

import tensorflow as tf

from typing import Generic, TypeVar
from typing_extensions import Literal

ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")

Shape = Literal

class Float32:
    dtype = tf.float32

class Float64:
    dtype = tf.float64

# TODO(jeff): generate all dtypes

class Tensor(Generic[ShapeType, DataType]):
    @classmethod
    def shape(self):
        return self.__args__[0].__values__

    @classmethod
    def dtype(self):
        return self.__args__[1].dtype

    def __add__(self, other):
        return self + other

def function(fn):
    annotation_values = fn.__annotations__.values()
    tensor_specs = [tf.TensorSpec(x.shape(), x.dtype()) for x in annotation_values]
    return tf.function(fn, input_signature=tensor_specs)

@function
def foo(x: Tensor[Shape[None, 2, 3], Float64]):
    return x + 42.0

foo(tf.random.normal([1, 2, 3], dtype=tf.float64))  # OK
foo(tf.random.normal([2, 2, 3], dtype=tf.float64))  # OK
foo(tf.random.normal([1, 2, 3, 4], dtype=tf.float64))  # NOT OK
foo(tf.random.normal([1, 2, 3], dtype=tf.float32))  # NOT OK

which seems to pass the correct input_signature in this very simple example but has some mypy errors that I don't know how to deal with:

$ mypy types_test.py
types_test.py:1: error: No library stub file for module 'tensorflow'
types_test.py:1: note: (Stub files are from https://github.com/python/typeshed)
types_test.py:44: error: Variable "types_test.Shape" is not valid as a type
types_test.py:44: error: Invalid type: try using Literal[2] instead?
types_test.py:44: error: Invalid type: try using Literal[3] instead?
Found 4 errors in 1 file (checked 1 source file)
danmou commented 4 years ago

@jeffpollock9 you'll probably need to redefine __class_getitem__ for the Tensor class. (That's the method that's called when you call Tensor[...]).

danmou commented 4 years ago

Also I don't think defining Shape as a Literal will work (then undefined dimensions wouldn't work). I don't think it's possible to make mypy handle shapes correctly anyway without extending mypy itself, so I think the easiest solution for now would to simply make __class_getitem__ for Tensor return a Tensor with some attributes set for shape and dtype (which will be ignored by mypy but can be used by tf.function). Possibly mypy could handle dtypes correctly, but I'd say that's not so important anyway until someone starts type annotating the entire TF library.

jeffpollock9 commented 4 years ago

Many thanks for the comments, @Danmou! As far as I can tell __class_getitem__ was added in python3.7 so have switched to that (was on 3.6 before).

I'm trying to figure out a better way of making some sort of Shape type since you mentioned Literal is not a good idea - but I am not sure how. If you don't mind - do you have any ideas? This is where I have got to so far:

import tensorflow as tf

from typing import Generic, TypeVar, get_type_hints
from typing_extensions import Literal

ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")

# TODO(jeff): this shouldn't be Literal
Shape = Literal

class Float32:
    dtype = tf.float32

class Float64:
    dtype = tf.float64

# TODO(jeff): generate all dtypes

class Tensor(Generic[ShapeType, DataType]):
    def __class_getitem__(cls, item):
        shape = item[0].__args__
        dtype = item[1].dtype
        return shape, dtype

def function(fn):
    type_hints = get_type_hints(fn)
    input_signature = [
        tf.TensorSpec(shape, dtype, name) for name, (shape, dtype) in type_hints.items()
    ]
    return tf.function(fn, input_signature=input_signature)

@function
def foo(x: Tensor[Shape[None, 2, 3], Float64], y: Tensor[Shape[1, 1, 1], Float64]):
    return x + y

>>> print(foo.input_signature)
(TensorSpec(shape=(None, 2, 3), dtype=tf.float64, name='x'), TensorSpec(shape=(1, 1, 1), dtype=tf.float64, name='y'))

with:

$ mypy types_test.py
types_test.py:1: error: No library stub file for module 'tensorflow'
types_test.py:1: note: (Stub files are from https://github.com/python/typeshed)
types_test.py:40: error: Variable "types_test.Shape" is not valid as a type
types_test.py:40: error: Invalid type: try using Literal[2] instead?
types_test.py:40: error: Invalid type: try using Literal[3] instead?
types_test.py:40: error: Invalid type: try using Literal[1] instead?
types_test.py:41: error: Unsupported left operand type for + ("Tensor[Shape?[None, Any, Any], Float64]")
Found 6 errors in 1 file (checked 1 source file)

So indeed there is a problem with using Shape = Literal but I am not sure how to make a new type which can hold the list of shape data.

Thanks

mdanatg commented 4 years ago

I think this is going in the right direction. Here's a version of @jeffpollock9 's code that I think mypy is happy with (barring the lack of annotations in TensorFlow).

Since in the space of types there are no values (None is just sugar for NoneType), it doesn't seem currently possible to specify something like MyCustomType[1] without the use of Literal. So the type annotations will look a bit awkward. Perhaps a future PEP could relax that.

It appears that Literal is the only special type that allows a variable number of things (and even in that case the values are packed into a sugared tuple). In other words we can't define a type Shape[*Dim] and we're forced to either use Literal[(None, 2, 3)] or to specialize things by rank, like Shape1D, Shape2D, etc. I'm also a bit afraid of Literal because it has the semantic "the value can be any one of these (their order is irrelevant)", but here we really need to say: "the value is this specific list (in this specific order)". That's why I used Literal[(a, b, c)] and not just Literal[a, b, c].

Lastly, I think we can also get by entirely using inspect to examine these type arguments - see the code.

Here's the code:

import tensorflow as tf

import inspect
import typing
from typing import Any, Generic, TypeVar, get_type_hints
from typing import NewType
from typing_extensions import Literal

ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")

class Shape(Generic[ShapeType]):
  pass

class Float32(object):
    value = tf.float32

class Float64(object):
    value = tf.float64

# TODO(jeff): generate all dtypes

class Tensor(Generic[ShapeType, DataType]):
  def __rmul__(self, other: Any):
    pass  # Just appeasing mypy here, the real Tensor has a proper implementation.

  pass

def function(fn):
    argspec = inspect.getfullargspec(fn)
    if (argspec.varargs is not None or argspec.varkw is not None or argspec.varkw is not None):
      raise NotImplemented('only positional args for now')

    input_signature = []
    for name in argspec.args:
      if name not in argspec.annotations:
        input_signature.append(None)
        continue
      shape_as_type, dtype = argspec.annotations[name].__args__
      shape = []
      for s in shape_as_type.__args__[0].__values__:
        if s is None:
          shape.append(None)
        else:
          shape.append(int(s))

      ts = tf.TensorSpec(shape=shape, dtype=dtype.value)
      input_signature.append(ts)
    return tf.function(fn, input_signature=input_signature)

@function
def foo(x: Tensor[Shape[Literal[(None, 2, 3)]], Float64]):
    return 2 * x

foo(tf.random.normal([1, 2, 3], dtype=tf.float64))  # OK
foo(tf.random.normal([2, 2, 3], dtype=tf.float64))  # OK
try:
  foo(tf.random.normal([1, 2, 3, 4], dtype=tf.float64))  # NOT OK
  assert False
except ValueError:
  pass
try:
  foo(tf.random.normal([1, 2, 3], dtype=tf.float32))  # NOT OK
  assert False
except ValueError:
  pass
mdanatg commented 4 years ago

Going the Literal-free path might not be so bad. Here's a version that's very verbose, but the type annotation looks quite nice. I named the dimensions MNISTHeight and MNISTWeight to show that such boilerplate-y types can have an actual intuitive meaning.

## This is what the gigantic file of type defs would contain

Shape3DDim1 = TypeVar("Shape3DDim1")
Shape3DDim2 = TypeVar("Shape3DDim2")
Shape3DDim3 = TypeVar("Shape3DDim3")

class Shape3D(Generic[Shape3DDim1, Shape3DDim2, Shape3DDim3]):
  pass

class Dimension(object):
  value = NotImplemented

class Dynamic(Dimension):
  value = None

## This is what the user would have to define:

class MNISTWidth(Dimension):
  value = 2

class MNISTHeight(Dimension):
  value = 3

@function
def foo(x: Tensor[Shape3D[Dynamic, MNISTWidth, MNISTHeight], Float64]):
    return 2 * x
jeffpollock9 commented 4 years ago

@mdanatg thanks for this! I really like your Literal-free code - since it doesn't seem possible to define a Shape[*Dim] type I think this is the way to go. The only downside is the big file of typedefs as you mentioned - but I think we could automatically generate a file with up to (say) Shape10D and I can't imagine it ever being a limitation.

jeffpollock9 commented 4 years ago

I've made a few changes to the code above:

Firstly, I don't think we need to handle a None value in the input_signature list as this doesn't work with tf.function anyway, i.e. this doesn't work:

@tf.function(input_signature=[None, tf.TensorSpec([1, 2], tf.float32)])
def foo(x, y):
    return x + y

with:

TypeError: Invalid input_signature [None, TensorSpec(shape=(1, 2), dtype=tf.float32, name=None)]; input_signature must be a possibly nested sequence of TensorSpec objects.

so we can remove:

if name not in argspec.annotations:
    input_signature.append(None)
    continue

Secondly, I had to change:

for s in shape_as_type.__args__[0].__values__:

to

for s in shape_as_type.__args__:

Thirdly, for the inner loop over the shapes, should it not be s.value instead of s?

so the full code is:

import tensorflow as tf
import inspect

from typing import Generic, Any, TypeVar

# TODO: generate all dtypes
# TODO: generate all shapes

ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")

Shape3DDim1 = TypeVar("Shape3DDim1")
Shape3DDim2 = TypeVar("Shape3DDim2")
Shape3DDim3 = TypeVar("Shape3DDim3")

class Shape3D(Generic[Shape3DDim1, Shape3DDim2, Shape3DDim3]):
    pass

class Dimension(object):
    value = NotImplemented

class Dynamic(Dimension):
    value = None

class Float32(object):
    value = tf.float32

class Float64(object):
    value = tf.float64

class Tensor(Generic[ShapeType, DataType]):
    def __rmul__(self, other: Any):
        pass  # Just appeasing mypy here, the real Tensor has a proper implementation.

def function(fn):
    argspec = inspect.getfullargspec(fn)
    if argspec.varargs is not None or argspec.varkw is not None:
        raise NotImplemented("only positional args for now")

    input_signature = []
    for name in argspec.args:
        shape_as_type, dtype = argspec.annotations[name].__args__
        shape = []
        for s in shape_as_type.__args__:
            if s.value is None:
                shape.append(None)
            else:
                shape.append(int(s.value))

        ts = tf.TensorSpec(shape=shape, dtype=dtype.value, name=name)
        input_signature.append(ts)
    return tf.function(fn, input_signature=input_signature)

# User code starts here
class MNISTWidth(Dimension):
    value = 2

class MNISTHeight(Dimension):
    value = 3

@function
def foo(x: Tensor[Shape3D[Dynamic, MNISTWidth, MNISTHeight], Float64]):
    return 2.0 * x

# Some ad hoc testing
print(f"foo signature: {foo.input_signature}")
foo_x_ts = tf.TensorSpec(shape=[None, 2, 3], dtype=tf.float64, name="x")
assert len(foo.input_signature) == 1
assert foo.input_signature[0] == foo_x_ts

@function
def bar():
    return tf.random.normal([1, 2, 3])

print(f"bar signature: {bar.input_signature}")
assert bar.input_signature == ()
$ python types_test.py 
foo signature: (TensorSpec(shape=(None, 2, 3), dtype=tf.float64, name='x'),)
bar signature: ()

$ mypy types_test.py
types_test.py:1: error: No library stub file for module 'tensorflow'
types_test.py:1: note: (Stub files are from https://github.com/python/typeshed)
Found 1 error in 1 file (checked 1 source file)

I also removed the extra check in argspec.varkw is not None or argspec.varkw is not None - I guess that was just a typo?

mdanatg commented 4 years ago

Yep, your edits all look good! The Literal-free version did require them, but I didn't want that to clutter the post. You're right about input_signature, it only supports None in a change that's not submitted yet, sorry! Probably best to raise an error for now. In future version we should be able to leave some args without annotation and their shape/type will be inferred.

mdanatg commented 4 years ago

FYI, https://github.com/tensorflow/community/pull/208 aims to establish a home for type definitions such as these. The RFC mentions this ongoing work, but we can include more specific details if ready.

jeffpollock9 commented 4 years ago

@mdanatg thanks for this - looks really interesting! I had a couple of evenings to try and add some of this to tensorflow but was struggling to even run the existing tests as TF takes days to build on my laptop. I'm hoping to have some time to try again soon but if there is anything in particular I could contribute please let me know.

FYI, tensorflow/community#208 aims to establish a home for type definitions such as these. The RFC mentions this ongoing work, but we can include more specific details if ready.

mdanatg commented 3 years ago

Quick note, DeemMind has created in implementation similar to the ideas in this thread: https://github.com/deepmind/tensor_annotations

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 1 year.