persephone-tools / tensorflow-stubs

Type stubs for the tensorflow library
MIT License
17 stars 9 forks source link

Helping out #3

Open acvander opened 6 years ago

acvander commented 6 years ago

Hi, Is this project still being developed? If so, I would be happy to help out. I've started looking at solving the testing, but is there anything I need to do for submitting a pull request? @shuttle1987

shuttle1987 commented 6 years ago

@acvander, thanks for expressing interest in being involved!

This project is still being developed but not as vigorously as before as we now have coverage of the areas we needed in our other project. So usually it's a good idea for people to run the tests before submitting a PR but there's no testing framework set up here yet. As such getting some testing would be a very high value next step as it would help all the development effort get more velocity. I particularly like the approach from numpy-stubs https://github.com/numpy/numpy-stubs/tree/master/tests where they have created files that should pass and files that should fail. This would help us gain insights into what stubs need to be written and reveal bugs faster.

I have been thinking it might be good to extract stubs from the Tensorflow repository programatically. The only reason I hadn't done that already is because it's not all pure python, which means you have to do a bit of extra work.

One thing worth knowing about is this @tf_export decorator defined in tensorflow/python/util/tf_export.py which modifies how names are exported.

For example see this situation where we have a tf.Variable: (from https://github.com/tensorflow/tensorflow/blob/d8f9538ab48e3c677aaf532769d29bc29a05b76e/tensorflow/python/ops/variables.py#L39)

@tf_export("Variable")
class Variable(checkpointable.CheckpointableBase):

This is fairly straightforward, there's a class called Variable created and it is exported to the name Variable.

However there's a more complex case such as Saver (https://github.com/tensorflow/tensorflow/blob/28340a4b12e286fe14bb7ac08aebe325c3e150b4/tensorflow/python/training/saver.py#L1074)

@tf_export("train.Saver")
class Saver(object):

This creates an object called Saver that's found at train.Saver.

So what I had planned was scanning through the Tensorflow source for these tf_exports then generating stubs for those objects and placing them into the relevant directories and .pyi files. Unfortunately I haven't had time to do this. I would be more than happy to help out with any PRs you happen to put together.

shuttle1987 commented 6 years ago

I started putting some testing structure together in #4

acvander commented 6 years ago

Thanks for the advice. I've started working on a few tests which you can find them here.

I've got a good idea of the passing tests, but unfortunately I'm having some issues with some of the fail tests here. Also, I'm not really sure what the reveal tests are supposed to do or how they should work.

I will start taking a look at generating the stubs programatically. There is a mypy stubgen tool. I tried running it but it did not generate the results I was expecting.

shuttle1987 commented 6 years ago

The "reveal" tests are to test types, consider the following:

foo = 1.3
bar = "baz"

reveal_type(foo)  # E: float
reveal_type(bar) # E: str

This will only pass if foo is of type float and bar is of type str. This provides a way for us to make sure that when we are working on our type stubs that the types are what we expect them to be, this is useful for more complex operations.

I put this example into the reveal tests directory.

shuttle1987 commented 6 years ago

@acvander

I've got a good idea of the passing tests, but unfortunately I'm having some issues with some of the fail tests here. Also, I'm not really sure what the reveal tests are supposed to do or how they should work.

Can you give a bit more info? It looks like your example of:

tf.nn.ctc_beam_search_decoder(
    tf.placeholder(tf.float32, shape=(100, 256, 10)),
    'string',
    beam_width=10,
    top_paths=10,
merge_repeated=True) # E: incompatible type

Should indeed fail. If it doesn't fail that means the typestubs here are incomplete and that would be a great first issue for you to open once you merge that test case in.

Please open a PR with that test case.

I will start taking a look at generating the stubs programatically. There is a mypy stubgen tool. I tried running it but it did not generate the results I was expecting.

The issue with the mypy stubgen tool is that it expects the modules where it's generating the stubs to be the same as the modules in which they come up in the code. I was trying to explain before that with tensorflow there's a decorator that changes where the export is located. This will make such a tool not work.

One approach we might consider is to do some sort of name mangling on the source and create a mapping between the the function names and the mangled typestubs then rearrange the generated type stubs into the correct locations after we run that tool.

So for example :

@tf_export("train.Saver")
class Saver(object):

First we do a source transform:

@tf_export("train.Saver")
class UniqueName123_Saver(object):

Then store a mapping between the names and where they should be output to:

# tuples of original name and module path
module paths = {
   "UniqueName123_Saver": ("Saver", "train.Saver")
}

Strip the decorator:

class UniqueName123_Saver(object):

Run the stubgen tool and then transform the source of the stub files to the appropriate module locations as defined my module_path mapping we created before and map the name back to the original name