napari / napari-core

BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

automagic documentation parsing #6

Open kne42 opened 6 years ago

kne42 commented 6 years ago
kne42 commented 6 years ago

numpydoc.docscrape includes the classes FunctionDoc and ClassDoc to parse NumPy documentation into a mapping object:

In [1]: from numpydoc.docscrape import FunctionDoc, ClassDoc

In [2]: def foo(arg1, arg2):         
   ...:     """Short Description     
   ...:     More Short Description...
   ...:                             
   ...:     Expanded Description    
   ...:                             
   ...:     Parameters              
   ...:     ----------              
   ...:     arg1 : int              
   ...:         description of arg1
   ...:     arg2 : int
   ...:         description of arg2d
   ...:         
   ...:     Returns
   ...:     -------
   ...:     ret : int
   ...:         description of ret
   ...:         
   ...:     Notes
   ...:     -----
   ...:     Additional Notes
   ...:     
   ...:     Raises
   ...:     ------
   ...:     Exception
   ...:         When it will be raised.
   ...:         
   ...:     References
   ...:     ----------
   ...:     .. [1] ref1
   ...:     .. [2] ref2
   ...:     """
   ...:     pass
   ...: 

In [3]: doc = FunctionDoc(foo)

In [4]: [key for key in doc.keys()]
Out[4]: 
['Signature',
 'Summary',
 'Extended Summary',
 'Parameters',
 'Returns',
 'Yields',
 'Raises',
 'Warns',
 'Other Parameters',
 'Attributes',
 'Methods',
 'See Also',
 'Notes',
 'Warnings',
 'References',
 'Examples',
 'index']

In [5]: doc['Signature']
Out[5]: 'foo(arg1, arg2)'

In [6]: doc['Summary']
Out[6]: ['Short Description', 'More Short Description...']

In [7]: doc['Extended Summary']
Out[7]: ['Expanded Description']

In [8]: doc['Parameters']
Out[8]: 
[('arg1', 'int', ['description of arg1']),
 ('arg2', 'int', ['description of arg2d'])]

In [9]: doc['References']
Out[9]: ['.. [1] ref1', '.. [2] ref2']

In [10]: doc['Raises']
Out[10]: [('Exception', '', ['When it will be raised.'])]
jni commented 6 years ago

This is super crazy. =D The main thing will be figuring out the types in the Parameters section. When more than one type is accepted, the format becomes a little more free-form, at least in skimage. But, in the first instance, we could annotate known types, and simply omit unknown ones (with warnings somehow that would allow human curation).

kne42 commented 6 years ago

@jni As promised, here is a (very) rough prototype of autotyping: https://gist.github.com/kne42/93f8a9d09aa8c5ea1160f1b18a51f2d9. Sorry for the delay; I forgot I had some elements of a project due lol.

Since the nbviewer appears to be timing out due to the length of the notebook, here are the results (edited for clarity):

Function:  gaussian

Name:  image
Original:  array-like
Parsed:  Array

Name:  sigma
Original:  scalar or sequence of scalars, optional
Parsed:  Optional[Union[Scalar, Sequence]]

Name:  output
Original:  array, optional
Parsed:  Optional[Array]

Name:  mode
Original:  {'reflect', 'constant', 'nearest', 'mirror', 'wrap'}, optional
Parsed:  Optional[Set]

Name:  cval
Original:  scalar, optional
Parsed:  Optional[Scalar]

Name:  multichannel
Original:  bool, optional (default: None)
Parsed:  Optional[Boolean]

Name:  preserve_range
Original:  bool, optional
Parsed:  Optional[Boolean]

Name:  truncate
Original:  float, optional
Parsed:  Optional[Float]
kne42 commented 6 years ago

Plan of attack for parsing types out of strings:

  1. normalize string
    1. remove optional keyword
  2. parse type string
    1. attempt to match against generic types (union, array, sequence, etc.)
      • matches a regex; passes groups to a callable that returns a type
    2. attempt to match against non-generic types (int, float, scalar, etc.)
      • searches for a regex; replaces with a type
    3. raise an error if no match found
I expect to finish infrastructure in 1-2 days. I expect to finish covering most use-cases in 1-2 weeks.
kne42 commented 6 years ago

Checklist w/ working notebook (infrastructure is complete):

https://gist.github.com/kne42/93f8a9d09aa8c5ea1160f1b18a51f2d9

kne42 commented 6 years ago

@jni Okay most of the main types have been done. How do you think we should encode specific types of data/are there any more types that I haven't covered?

Here is the output (again cleaned up a bit):

Function:  gaussian

Name:  image
Original:  array-like
Parsed:  Array[~Number]

Name:  sigma
Original:  scalar or sequence of scalars, optional
Parsed:  Optional[Union[~Scalar, Sequence[~Scalar]]]

Name:  output
Original:  array, optional
Parsed:  Optional[Array[~Number]]

Name:  mode
Original:  {'reflect', 'constant', 'nearest', 'mirror', 'wrap'}, optional
Parsed:  Optional[Mode]

Name:  cval
Original:  scalar, optional
Parsed:  Optional[~Scalar]

Name:  multichannel
Original:  bool, optional (default: None)
Parsed:  Optional[~Boolean]

Name:  preserve_range
Original:  bool, optional
Parsed:  Optional[~Boolean]

Name:  truncate
Original:  float, optional
Parsed:  Optional[~Real]