usnistgov / PyHyperScattering

Tools for hyperspectral x-ray and neutron scattering data loading, reduction, slicing, and visualization.
Other
6 stars 8 forks source link

Proposed change/break in API for masks passing into integrators. #47

Closed pbeaucage closed 1 year ago

pbeaucage commented 1 year ago

In polygon_masks, @EliotGann proposes a change in the API for mask import to integrators. I think this is worth some discussion or at least documentation.

Current API:

(str) mask_method: keyword for how to load a mask, 'none' (default) or 'nika' at present. (str or Pathlib.Path): mask_path: path to a mask file or None

The constructor looks at mask_method, decides if it can handle this method, and routes the mask_path argument to a worker function, prototypically loadNikaMask().

Polygon masks would be implemented by adding 'polygon' as an allowed mask_method and adding a new kwarg, default none, mask_polygon which would be (datatype tbd, probably a list of tuples as x,y pixel coords or list of lists?)

New proposed API:

Single argument

(dict) mask: single dictionary with keys as follow: ''' loads a mask either from a path, from a NIKA file, or from a list of polygon points the mask dictionary should have keys indicating which method to use and the necessary information for a path, mask itself can be a string, or a dictionary with the key path whose value is the string for NIKA, there should be a key "nika" with the path to that file for a polygon, there should be a key "points" with a list of lists of points i.e. [[[1050,480],[500,480],[500,520],[1050,520]],[[1050,80],[500,80],[500,120],[1050,120]]]

    '''

There are good reasons to do this:

Reasons to NOT do this:

I am somewhat torn and seeking input from project developers/superusers like @pdudenas @dsunday @cbishop4 @martintb @ktoth17 @bijalbpatel

martintb commented 1 year ago

My 2c:

Your pros/cons are mostly good. Starting with the CONS, I would most strongly amplify bullets three and four. If you don't have API documentation in place, it makes it impossible for users to know what to pass in or what the defaults are. Even with some prior knowledge, it puts a burden on the user to know if the correct dictionary key is mask_method, maskMethod, MaskMethod, or MaskM3th0d. You also preclude yourself from using tools like MyPy with this approach. Readability is a key tenet of Python and this approach is not really in that spirit.

On the other hand, maintainability and consistency are very important. The more you can make an entire codebase or set of codebases have the same flow/feel, the easier it will be to use them. The codebase will grow more fluidly if you design it for expansion.

Another point I would make is that you're somewhat close to making use of Python's args/kwargs pattern. In your case for only keyword arguments:

def my_function(**kwargs):
    pass

You can also combine **kwargs with explicitly named arguments

def my_function(mask_method=None,mask_path=None,**kwargs):
    pass

These have similar issues to just passing the dictionary, but they also allow users to pass normal keyword arguments or use dictionary unpacking e.g.,


#both of the following are equivalent

my_function(mask_method='nika', mask_path=pathlib.Path('/path/to/mask'))

input_dict = {'mask_method':'nika','mask_path':pathlib.Path('/path/to/mask')}
my_function(**input_dict)

I do not feel strongly that this is better or worse than just passing a dictionary, but it is a feature in Python to consider.

Overall, I'm not sure if I'm fully for or against the code change at this point, but I probably lean a little against the exact proposal you describe above. I would consider other approaches such as setting up container objects (liked DataClasses (>3.7), NamedTuples, or custom classes) rather than passing an arbitrary dictionary. This would solve some of the problems I highlighted in the first paragraph, but you're really just moving the arguments to another location (that can have methods and validation checks built in).

See here for a discussion on this topic from people other than me.

EliotGann commented 1 year ago

Definitely not the only way to do it, it was just a way to add in the flexibility needed. Probably changing the api too much isn’t required if there can be options added via kwargs as Tyler suggests. The only required option would be the type, then if a path is required then an optional path parameter can be extracted from kwargs, if a list of points is required, that parameter can be extracted from kwargs… if a rotation parameter is added, it can be applied, etc.

BijalBPatel commented 1 year ago

I also like the kwargs approach to directly address PROS 1 and 3 in your list above and to avoid the problems of CONS 1, 3, and 4. This method gives you the same flexibility of a single megaparameter dictionary while still supporting tab complete and docstrings.

From experience, its super easy to implement.

EliotGann commented 1 year ago

Yeah, someone could still construct and pass around a dictionary but when calling pyhyper they just put it in with the **, then pyhyper functions can just use the kwargs dictionary, so the functionality on both sides could basically be the same.

pbeaucage commented 1 year ago

This is pretty much implemented as we discussed. A new kwarg "maskpoints" is used for list of polygons input.