napari / napari-core

BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Annotate all scikit-image functions with types #1

Open jni opened 6 years ago

jni commented 6 years ago

In order to automatically generate a GUI and to know what to do with the return types of each function, we need the functions to be properly annotated. Is an output array an image or a set of coordinates?

We can then use these annotations to display each type appropriately.

Initially, we can try to do some processing of our current annotations from NumPy style docstrings.

kne42 commented 6 years ago

Yeah, I was thinking about this in addition to general parameter typing with GUI elements – for example labeling discrete data types for use with dropdowns, small-range bounded continuous data types for use with sliders, large-range and unbounded continuous data types with input fields, angles with a directional vector, etc.

One thing that may present a problem is API changes. Should we assume specific/minimum versions for the interfaces we will be working with?

kne42 commented 6 years ago

@jni Ok, now that I've actually read up on typing annotations (just found out about them lol!) and I'm going to have to disagree with you here. Python 3.5-style typing is useful, yes, but I don't think that it necessarily has any advantages over using decorators for our use case, so it doesn't really warrant a Python 3.5+ dependency.

Furthermore, the "typing" that we want is less of actual typing and more of instructions for how we want to build our GUI structures. Additionally, from my brief skimming of the documentation, it doesn't look like we can add arguments to these types (and I hope not for the sake of bloat), which means that we won't be able to set bounds for continuous data types and label the meaning of "None", among other useful utilities.

TLDR; I propose that we use decorators in conjunction with classes to represent our GUI for access to additional structure parameters.

jni commented 6 years ago

Hi @kne42!

Several points:

The first one is that Napari will be either Py3.5+ or Py3.6+ anyway, so we get type annotations for free. We are not taking any additional dependencies for this. (This is true of skimage master also, btw.) As an aside, no new Python projects should use an earlier version of Python imho.

Now, if we assume we can use typing, what can we do?

The goal is to crawl the scikit-image API, populate the Napari menus with functions that make sense, and use the appropriate display functions for each conceptual "type". A constraint is that whatever solution we pick should not create a tight coupling between Napari and any libraries it uses (be it skimage, NumPy, TensorFlow...).

I agree with you that we probably don't want to create a "type" for "float in [0.01, 100]" or whatever, (though I think this is totally possible to do) (and actually maybe it is the right thing to do), but we can create type definitions for different conceptual "arrays", and I think that these would actually be useful in scikit-image to generate documentation anyway.

As to parameter ranges, I think that we shouldn't worry too much about sliders etc for now and use a text box. If the user inputs an "incorrect" value, hopefully we raise a useful error message that will point them to the right place in the skimage documentation.

What do you think? This is all a bit vague without a clear implementation, so it might be best to attempt something and see how it looks. (I also believe there was a discussion in the NumPy mailing list about typing for NumPy, that might be informative.) I don't have time to mock this up right now but will later this week.

royerloic commented 6 years ago

On 4. Jun 2018, at 22:43, Juan Nunez-Iglesias notifications@github.com wrote:

Hi @kne42 https://github.com/kne42!

Several points:

The first one is that Napari will be either Py3.5+ or Py3.6+ anyway, so we get type annotations for free. We are not taking any additional dependencies for this. (This is true of skimage master also, btw.) As an aside, no new Python projects should use an earlier version of Python imho.

Definitely. Now, if we assume we can use typing, what can we do?

The goal is to crawl the scikit-image API, populate the Napari menus with functions that make sense, and use the appropriate display functions for each conceptual "type". A constraint is that whatever solution we pick should not create a tight coupling between Napari and any libraries it uses (be it skimage, NumPy, TensorFlow...).

I agree. I agree with you that we probably don't want to create a "type" for "float in [0.01, 100]" or whatever, (though I think this is totally possible to do) (and actually maybe it is the right thing to do), but we can create type definitions for different conceptual "arrays", and I think that these would actually be useful in scikit-image to generate documentation anyway.

I also think that defining a functional ‘contractt with value ranges is a very good thing going forward. this will be essential to be able to have sliders, good for documentation. I come from the java world, and i am of course biased towards strong typing.

As to parameter ranges, I think that we shouldn't worry too much about sliders etc for now and use a text box. If the user inputs an "incorrect" value, hopefully we raise a useful error message that will point them to the right place in the skimage documentation.

We can start simple, but even with a text box, it would be great if we can tell the user what kind of values to expect… That would have a great impact on ergonomics. I have seen too many tools with complicated GUIs and tons of parameters, and you never know which kind of values are ‘correct’ which also makes it difficult to understand the meaning of these paramneters,,,,

Any chance we can have also some sort of annotation of the function that describes what it does? so we can also populate that in the menus and gui popups… ?

What do you think? This is all a bit vague without a clear implementation, so it might be best to attempt something and see how it looks. (I also believe there was a discussion in the NumPy mailing list about typing for NumPy, that might be informative.) I don't have time to mock this up right now but will later this week.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Napari/napari/issues/1#issuecomment-394588579, or mute the thread https://github.com/notifications/unsubscribe-auth/AByMkljIGXQS6g04VNCnzysEVeKCzd_mks5t5hp9gaJpZM4UYKRs.

kne42 commented 6 years ago

@royerloic I haven't looked at it yet, but we should be able to use the numpy doc parsing tools to get a lot of this information out. We can have a short description displayed that parallels the first line of text in the docstring and have an expanded explanation for the rest of it, as well as conversion of the Notes section to an info button/tooltip, etc.

jni commented 6 years ago

Any chance we can have also some sort of annotation of the function that describes what it does? so we can also populate that in the menus and gui popups… ?

Yes. All skimage and SciPy ecosystem functions follow the NumPy docstring convention that annotates all arguments and return values. We can automatically parse these and include them as (?) buttons or hover tooltips. Actually this is another awesome advantage of this ecosystem, we are going to get great docs for almost free.

By the way, @royerloic, you can turn these on in the preferences for PyCharm (search for "docstring"), and you will be given a nice template every time you start a new docstring.

I also think that defining a functional ‘contractt with value ranges is a very good thing going forward. this will be essential to be able to have sliders, good for documentation.

Sliders are harder to implement. A very large proportion of parameters are unbounded but only really make sense in some typical range. This range, though, is often data dependent, and anyway should not be restricted in case someone comes up with a pipeline where atypical argument values make sense. One solution might be to have sliders for some parameters, that can be reverted to text boxes by clicking on an "Advanced" interface button (with "Simplified" being the default).

I come from the java world, and i am of course biased towards strong typing.

Then you are entering Python at exactly the right moment. =)

kne42 commented 6 years ago

Let's move discussion of docstring parsing to #6 :)

kne42 commented 6 years ago

Sandbox notebook, playing around with typing and eventually searching and replacing in docstrings using regex: https://gist.github.com/kne42/ce76ad9dad0416b487479fbf80766e9e

jni commented 6 years ago

@kne42 an exciting notebook! How close do you think it is to working? Can't wait to see the output on skimage.filters.gaussian.__doc__, for example. =)

Come to think of it, a NumPy docstring -> Py3.5 type annotations conversion library would be tremendously impactful. A ton of projects could make use of it.

kne42 commented 6 years ago

@jni I expect to finish a rudimentary version of this by the end of tomorrow, or Friday at latest.

Custom type definitions have been covered for the most part, with the exception of arrays (I think I’ll implement a similar syntax to Cython memoryviews). I’ll open a PR to track these implementations soon and you can let me know what you think about the conventions used.

On Wed, Jun 6, 2018 at 6:19 PM Juan Nunez-Iglesias notifications@github.com wrote:

@kne42 https://github.com/kne42 an exciting notebook! How close do you think it is to working? Can't wait to see the output on skimage.filters.gaussian.doc, for example. =)

Come to think of it, a NumPy docstring -> Py3.5 type annotations conversion library would be tremendously impactful. A ton of projects could make use of it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Napari/napari/issues/1#issuecomment-395232143, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab0F02JBN_NtwEqnqMPlN542NB9yFIUtks5t6FVjgaJpZM4UYKRs .

kne42 commented 6 years ago

@jni Actually upon second thought, perhaps some hacky implementation might be better considering that bracket notation is already in use by types.

We would make a Shape class that overrides the __le__ operator to modify whatever type is on the right. The type on the right must implement Iterable, naturally.

Here's some example syntax:

Shape[:, :] <= Array[Integer]      # 2D int Array
Shape[...] <= Array[Float]         # nD float Array
Shape[:, :, ...] <= Array[Object]  # 2+D Object Array
Shape[3] <= Array[Float]           # flat float Array of length 3
Shape[..., 5] <= Array[Integer]    # nD int Array with last axis of size 5
Shape[0:5] <= List[Scalar]         # List of length 0-5
Shape[:5] <= List[Scalar]          # List of length 1-5 (we assume that most users
                                   # don't want an empty list, but this may be too bloated)
Shape[42:] <= List[Scalar]         # List of length 42+
Shape[::2]                         # only even lengths
Shape[(1, 5, 7), :]                # 2D; first axis must be of size 1, 5, or 7

and the option to have SizeVars, similar to TypeVars:

M = SizeVar('M')
N = SizeVar('N')

def foo(coords: Shape[M, ..., N] <= Array[Number],
        weights: Shape[M, ..., N] <= Array[Float],
        center: Union[Scalar, Shape[M] <= Iterable[Scalar]]
        ) -> Shape[M, N] <= Array[Float]
kne42 commented 6 years ago

Actually, to implement the above I think I'll need a lot longer than two days 😅

jni commented 6 years ago

@kne42 yes, yes you do. =P But a proof-of-concept, only-handle-the-easy-cases tool might be the best option to start with: only specify array (not shape or dtype), only specify ints/tuple-of-ints/etc. One way to do this would be to define some mappings (regex -> type), crawl the whole API, and make a table of the stuff that was captured by the mappings and the stuff that wasn't. Hopefully we converge on a few cases, and the ones that we don't have will become apparent. It is probably useful even just to normalise our docstrings on a common format!

We should also aim to stay as simple as possible until NumPy settles on a typing format. We definitely don't want to depend on a super-complex typing system encompassing ndims, shapes, dtypes, and transformations thereof, only to have NumPy supercede it with their own.

For our own purposes, I think that a set of "empty" types, meaning types that have no instances of that type, would be sufficient to annotate our functions with. For example, we could define a type Image, which would never be created, but rather functions where the input array or output array was, conceptually, an image, would have type "Union[Array, Image]". This would be sufficient for us to introspect the signature and do appropriate things with the UI/UX. Similarly for LabelImage, BooleanImage, Coordinates, Index (these are NumPy style tuple-of-array coordinates), etc. Do you think this could work?

kne42 commented 6 years ago

@jni These are all fair points. However, given the limitations of typing with regards to arrays at the current moment, I would much rather forsake navigating the complexity of real-type type annotations and use our own definitions that do not follow the proper specifications which is much easier for the type of prototyping that we want.

This is still syntactically correct because Python does not implement actual static typing (everything is done via third-party programs). We can then replace these dummy types with the real deal once a convention has been established. The reason that I desire this route over generic non-shaped arrays is because shape is immensely important to communicating interactions between function input and output, probably more so than the actual type of the array. Especially with how input shapes rely on each other, again, I think this information is too important to just discard.

jni commented 6 years ago

@kne42 I'm on board with this but I still think it's important to do a minimal proof of concept first, and expand that proof of concept to something more thorough (including shape info) later.

Also note that some of this info could be part of Image: Image2D, ImageMultichannel, etc.

jni commented 6 years ago

@kne42 in case you haven't seen these already, here's some links that could be useful:

https://github.com/numpy/numpy/issues/7370#issuecomment-345844120 https://github.com/numpy/numpy/issues/7370#issuecomment-349494653 https://github.com/numpy/numpy_stubs

kne42 commented 6 years ago

@jni How do you think we should go about typing {'reflect', 'constant', 'nearest', 'mirror', 'wrap'}?

I was thinking about autogenerating a type with subtypes representative of the options in the set but idk if there's a better way of encoding this type of data.

jni commented 6 years ago

@kne42 in languages like Haskell this would be an Enum type. Python actually has some facility for this:

https://docs.python.org/3/library/enum.html https://www.python.org/dev/peps/pep-0484/ (search for Enum)

it would be nice to get the relevant libraries to use Enum here but I don't think that's realistic, so perhaps using a Union again would be useful.

Thinking about it in terms of Napari, the goal would be to have a drop-down for this type of annotation, so whatever you do, you should be able to get the possible values from the type annotation.

kne42 commented 6 years ago

@jni Thank you for this! The typing library is actually compatible with class types in general so hopefully this will work. The reason we've been having so many issues with NumPy arrays is due to a lot of their functions (and subsequent derived SciPy functions) accepting array_like instead of explicit ndarrays. Also the conflict between array indexing and subtyping using the same syntax is an issue preventing us from necessarily using the ndarray class directly.

jni commented 6 years ago

Some interesting type discussions on the NumPy mailing list. Specifically, the new XND project has created an "ndtypes" library that includes pattern matching for dtypes and shapes: https://mail.python.org/pipermail/numpy-discussion/2018-June/078261.html

Not necessary promoting any action here, just making a note for future reference. =)