Closed jeffpollock9 closed 6 months ago
the concept of "None" == "Anything" is dumb and reflects poorly on Tensorflow and Keras team + users
Hi, I like the general idea of leveraging more standard Python features. Though one property we want to keep for @tf.function is that preserving the same semantics with or without @tf.function, and since we're not passing tf.TensorSpec instance as an argument, annotating the arguments to be tf.TensorSpec won't be strictly correct.
Hi @kkimdev. Yes I realised that TensorSpec
wasn't the right type shortly after writing this and agree it is weird/confusing to annotate the arguments with the wrong type. I can't think of a solution which would work for this now, since the "spec" (e.g. can choose any shape based on runtime arguments) is dynamic and the annotations are really meant to be static so that mypy etc can do static analysis.
Will close this later unless anyone has any ideas on how it could work properly.
Closing as it has been 2 weeks.
Why not make it possible to use tf.Tensor[shape, dtype]
as an annotation? This would also be useful in general to document the input requirements.
Yes, that would be the ideal format. I think you can use tf.Tensor (untemplated) right now in your regular type annotations, but implementing it as a generic type that captures both dtype and shape would take a bit more programming in the definition of the Tensor classes.
@Danmou yes that's a good idea - not sure why I didn't do that in the first place. I'll re-open this and try to have a go at implementing it but if anyone has any ideas/would like to help that'd be great.
EDIT: I can't re-open this, @kkimdev if you think this would be useful at all can you re-open please?
I've tried to knock up some code to try and figure out how this might work but am finding it quite hard (mainly due to my lack of experience with typing). If there is any feedback on this it would be great:
import tensorflow as tf
from typing import Generic, TypeVar
from typing_extensions import Literal
ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")
Shape = Literal
class Float32:
dtype = tf.float32
class Float64:
dtype = tf.float64
# TODO(jeff): generate all dtypes
class Tensor(Generic[ShapeType, DataType]):
@classmethod
def shape(self):
return self.__args__[0].__values__
@classmethod
def dtype(self):
return self.__args__[1].dtype
def __add__(self, other):
return self + other
def function(fn):
annotation_values = fn.__annotations__.values()
tensor_specs = [tf.TensorSpec(x.shape(), x.dtype()) for x in annotation_values]
return tf.function(fn, input_signature=tensor_specs)
@function
def foo(x: Tensor[Shape[None, 2, 3], Float64]):
return x + 42.0
foo(tf.random.normal([1, 2, 3], dtype=tf.float64)) # OK
foo(tf.random.normal([2, 2, 3], dtype=tf.float64)) # OK
foo(tf.random.normal([1, 2, 3, 4], dtype=tf.float64)) # NOT OK
foo(tf.random.normal([1, 2, 3], dtype=tf.float32)) # NOT OK
which seems to pass the correct input_signature
in this very simple example but has some mypy errors that I don't know how to deal with:
$ mypy types_test.py
types_test.py:1: error: No library stub file for module 'tensorflow'
types_test.py:1: note: (Stub files are from https://github.com/python/typeshed)
types_test.py:44: error: Variable "types_test.Shape" is not valid as a type
types_test.py:44: error: Invalid type: try using Literal[2] instead?
types_test.py:44: error: Invalid type: try using Literal[3] instead?
Found 4 errors in 1 file (checked 1 source file)
@jeffpollock9 you'll probably need to redefine __class_getitem__
for the Tensor class. (That's the method that's called when you call Tensor[...]
).
Also I don't think defining Shape
as a Literal
will work (then undefined dimensions wouldn't work). I don't think it's possible to make mypy handle shapes correctly anyway without extending mypy itself, so I think the easiest solution for now would to simply make __class_getitem__
for Tensor
return a Tensor
with some attributes set for shape and dtype (which will be ignored by mypy but can be used by tf.function
). Possibly mypy could handle dtypes correctly, but I'd say that's not so important anyway until someone starts type annotating the entire TF library.
Many thanks for the comments, @Danmou! As far as I can tell __class_getitem__ was added in python3.7 so have switched to that (was on 3.6 before).
I'm trying to figure out a better way of making some sort of Shape
type since you mentioned Literal
is not a good idea - but I am not sure how. If you don't mind - do you have any ideas? This is where I have got to so far:
import tensorflow as tf
from typing import Generic, TypeVar, get_type_hints
from typing_extensions import Literal
ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")
# TODO(jeff): this shouldn't be Literal
Shape = Literal
class Float32:
dtype = tf.float32
class Float64:
dtype = tf.float64
# TODO(jeff): generate all dtypes
class Tensor(Generic[ShapeType, DataType]):
def __class_getitem__(cls, item):
shape = item[0].__args__
dtype = item[1].dtype
return shape, dtype
def function(fn):
type_hints = get_type_hints(fn)
input_signature = [
tf.TensorSpec(shape, dtype, name) for name, (shape, dtype) in type_hints.items()
]
return tf.function(fn, input_signature=input_signature)
@function
def foo(x: Tensor[Shape[None, 2, 3], Float64], y: Tensor[Shape[1, 1, 1], Float64]):
return x + y
>>> print(foo.input_signature)
(TensorSpec(shape=(None, 2, 3), dtype=tf.float64, name='x'), TensorSpec(shape=(1, 1, 1), dtype=tf.float64, name='y'))
with:
$ mypy types_test.py
types_test.py:1: error: No library stub file for module 'tensorflow'
types_test.py:1: note: (Stub files are from https://github.com/python/typeshed)
types_test.py:40: error: Variable "types_test.Shape" is not valid as a type
types_test.py:40: error: Invalid type: try using Literal[2] instead?
types_test.py:40: error: Invalid type: try using Literal[3] instead?
types_test.py:40: error: Invalid type: try using Literal[1] instead?
types_test.py:41: error: Unsupported left operand type for + ("Tensor[Shape?[None, Any, Any], Float64]")
Found 6 errors in 1 file (checked 1 source file)
So indeed there is a problem with using Shape = Literal
but I am not sure how to make a new type which can hold the list of shape data.
Thanks
I think this is going in the right direction. Here's a version of @jeffpollock9 's code that I think mypy is happy with (barring the lack of annotations in TensorFlow).
Since in the space of types there are no values (None
is just sugar for NoneType
), it doesn't seem currently possible to specify something like MyCustomType[1]
without the use of Literal
. So the type annotations will look a bit awkward.
Perhaps a future PEP could relax that.
It appears that Literal
is the only special type that allows a variable number of things (and even in that case the values are packed into a sugared tuple). In other words we can't define a type Shape[*Dim]
and we're forced to either use Literal[(None, 2, 3)]
or to specialize things by rank, like Shape1D
, Shape2D
, etc. I'm also a bit afraid of Literal
because it has the semantic "the value can be any one of these (their order is irrelevant)", but here we really need to say: "the value is this specific list (in this specific order)". That's why I used Literal[(a, b, c)]
and not just Literal[a, b, c]
.
Lastly, I think we can also get by entirely using inspect to examine these type arguments - see the code.
Here's the code:
import tensorflow as tf
import inspect
import typing
from typing import Any, Generic, TypeVar, get_type_hints
from typing import NewType
from typing_extensions import Literal
ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")
class Shape(Generic[ShapeType]):
pass
class Float32(object):
value = tf.float32
class Float64(object):
value = tf.float64
# TODO(jeff): generate all dtypes
class Tensor(Generic[ShapeType, DataType]):
def __rmul__(self, other: Any):
pass # Just appeasing mypy here, the real Tensor has a proper implementation.
pass
def function(fn):
argspec = inspect.getfullargspec(fn)
if (argspec.varargs is not None or argspec.varkw is not None or argspec.varkw is not None):
raise NotImplemented('only positional args for now')
input_signature = []
for name in argspec.args:
if name not in argspec.annotations:
input_signature.append(None)
continue
shape_as_type, dtype = argspec.annotations[name].__args__
shape = []
for s in shape_as_type.__args__[0].__values__:
if s is None:
shape.append(None)
else:
shape.append(int(s))
ts = tf.TensorSpec(shape=shape, dtype=dtype.value)
input_signature.append(ts)
return tf.function(fn, input_signature=input_signature)
@function
def foo(x: Tensor[Shape[Literal[(None, 2, 3)]], Float64]):
return 2 * x
foo(tf.random.normal([1, 2, 3], dtype=tf.float64)) # OK
foo(tf.random.normal([2, 2, 3], dtype=tf.float64)) # OK
try:
foo(tf.random.normal([1, 2, 3, 4], dtype=tf.float64)) # NOT OK
assert False
except ValueError:
pass
try:
foo(tf.random.normal([1, 2, 3], dtype=tf.float32)) # NOT OK
assert False
except ValueError:
pass
Going the Literal
-free path might not be so bad. Here's a version that's very verbose, but the type annotation looks quite nice. I named the dimensions MNISTHeight
and MNISTWeight
to show that such boilerplate-y types can have an actual intuitive meaning.
## This is what the gigantic file of type defs would contain
Shape3DDim1 = TypeVar("Shape3DDim1")
Shape3DDim2 = TypeVar("Shape3DDim2")
Shape3DDim3 = TypeVar("Shape3DDim3")
class Shape3D(Generic[Shape3DDim1, Shape3DDim2, Shape3DDim3]):
pass
class Dimension(object):
value = NotImplemented
class Dynamic(Dimension):
value = None
## This is what the user would have to define:
class MNISTWidth(Dimension):
value = 2
class MNISTHeight(Dimension):
value = 3
@function
def foo(x: Tensor[Shape3D[Dynamic, MNISTWidth, MNISTHeight], Float64]):
return 2 * x
@mdanatg thanks for this! I really like your Literal
-free code - since it doesn't seem possible to define a Shape[*Dim]
type I think this is the way to go. The only downside is the big file of typedefs as you mentioned - but I think we could automatically generate a file with up to (say) Shape10D
and I can't imagine it ever being a limitation.
I've made a few changes to the code above:
Firstly, I don't think we need to handle a None
value in the input_signature
list as this doesn't work with tf.function
anyway, i.e. this doesn't work:
@tf.function(input_signature=[None, tf.TensorSpec([1, 2], tf.float32)])
def foo(x, y):
return x + y
with:
TypeError: Invalid input_signature [None, TensorSpec(shape=(1, 2), dtype=tf.float32, name=None)]; input_signature must be a possibly nested sequence of TensorSpec objects.
so we can remove:
if name not in argspec.annotations:
input_signature.append(None)
continue
Secondly, I had to change:
for s in shape_as_type.__args__[0].__values__:
to
for s in shape_as_type.__args__:
Thirdly, for the inner loop over the shapes, should it not be s.value
instead of s
?
so the full code is:
import tensorflow as tf
import inspect
from typing import Generic, Any, TypeVar
# TODO: generate all dtypes
# TODO: generate all shapes
ShapeType = TypeVar("ShapeType")
DataType = TypeVar("DataType")
Shape3DDim1 = TypeVar("Shape3DDim1")
Shape3DDim2 = TypeVar("Shape3DDim2")
Shape3DDim3 = TypeVar("Shape3DDim3")
class Shape3D(Generic[Shape3DDim1, Shape3DDim2, Shape3DDim3]):
pass
class Dimension(object):
value = NotImplemented
class Dynamic(Dimension):
value = None
class Float32(object):
value = tf.float32
class Float64(object):
value = tf.float64
class Tensor(Generic[ShapeType, DataType]):
def __rmul__(self, other: Any):
pass # Just appeasing mypy here, the real Tensor has a proper implementation.
def function(fn):
argspec = inspect.getfullargspec(fn)
if argspec.varargs is not None or argspec.varkw is not None:
raise NotImplemented("only positional args for now")
input_signature = []
for name in argspec.args:
shape_as_type, dtype = argspec.annotations[name].__args__
shape = []
for s in shape_as_type.__args__:
if s.value is None:
shape.append(None)
else:
shape.append(int(s.value))
ts = tf.TensorSpec(shape=shape, dtype=dtype.value, name=name)
input_signature.append(ts)
return tf.function(fn, input_signature=input_signature)
# User code starts here
class MNISTWidth(Dimension):
value = 2
class MNISTHeight(Dimension):
value = 3
@function
def foo(x: Tensor[Shape3D[Dynamic, MNISTWidth, MNISTHeight], Float64]):
return 2.0 * x
# Some ad hoc testing
print(f"foo signature: {foo.input_signature}")
foo_x_ts = tf.TensorSpec(shape=[None, 2, 3], dtype=tf.float64, name="x")
assert len(foo.input_signature) == 1
assert foo.input_signature[0] == foo_x_ts
@function
def bar():
return tf.random.normal([1, 2, 3])
print(f"bar signature: {bar.input_signature}")
assert bar.input_signature == ()
$ python types_test.py
foo signature: (TensorSpec(shape=(None, 2, 3), dtype=tf.float64, name='x'),)
bar signature: ()
$ mypy types_test.py
types_test.py:1: error: No library stub file for module 'tensorflow'
types_test.py:1: note: (Stub files are from https://github.com/python/typeshed)
Found 1 error in 1 file (checked 1 source file)
I also removed the extra check in argspec.varkw is not None or argspec.varkw is not None
- I guess that was just a typo?
Yep, your edits all look good! The Literal
-free version did require them, but I didn't want that to clutter the post.
You're right about input_signature
, it only supports None
in a change that's not submitted yet, sorry! Probably best to raise an error for now. In future version we should be able to leave some args without annotation and their shape/type will be inferred.
FYI, https://github.com/tensorflow/community/pull/208 aims to establish a home for type definitions such as these. The RFC mentions this ongoing work, but we can include more specific details if ready.
@mdanatg thanks for this - looks really interesting! I had a couple of evenings to try and add some of this to tensorflow but was struggling to even run the existing tests as TF takes days to build on my laptop. I'm hoping to have some time to try again soon but if there is anything in particular I could contribute please let me know.
FYI, tensorflow/community#208 aims to establish a home for type definitions such as these. The RFC mentions this ongoing work, but we can include more specific details if ready.
Quick note, DeemMind has created in implementation similar to the ideas in this thread: https://github.com/deepmind/tensor_annotations
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 1 year.
System information
Describe the feature and the current behavior/state.
tf.function
has an argumentinput_signature
which I have been using to try and make my code a bit safer and ensure I don't keep re-tracing functions. Theinput_signature
specifies the tensor type for each of the function arguments. It would be much nicer (I think) to specify these types using python (>=3.5) annotations, where a suitable version of python is available. A very rough example looks like:Which I think is nicer than the current signature:
I think the main benefit of the annotation approach is that the argument name and type are beside each other, and this syntax is already widely used in python.
In order to enable using annotations as the
input_signature
I think there should be an extra boolean argument totf.function
called e.g.use_annotation_input_signature
which defaults toFalse
.Also note I have set
autograph=False
here to avoid a warning:I am guessing a proper implementation inside of
tf.function
would not have this problem.Will this change the current api? How?
It would add an additional argument to
tf.function
which at the default value would not change anything.Who will benefit with this feature?
Anyone using python >= 3.5 who would like to specify the tensor types of their functions.
Any Other info.
None