pycroscopy / sidpy

Python utilities for storing, processing, and visualizing spectroscopic and imaging data
https://pycroscopy.github.io/sidpy/
MIT License
11 stars 14 forks source link

Process or Operation class #30

Open ssomnath opened 4 years ago

ssomnath commented 4 years ago

We will need a very lightweight abstract class that standardizes the API for algorithms. This class will not include code for supporting embarrassingly parallel computation like pyUSID.Process does. However, it should have a write_to_disk(writer) where writer is a Writer object that needs to be implemented by pyNSID and pyUSID. The idea is that algorithms which are functions right now would be converted to / used in light-weight classes. Whenever a user uses a Process, the computation will be performed and returned in memory only. If the user wants, they could choose to write the results to disk and capture all the provenance. write_results() will be the only thing that developers need to implement to enable provenance tracking in the HDF5 file

ssomnath commented 4 years ago

Perhaps we may not need a Writer class. We can make do with a kwarg: format=nsid or something like that

ssomnath commented 4 years ago

After further consideration. Provenance will be facilitated on an opt-in basis. Priority level for this issue has also been lowered. Possible solutions:

  1. Develop a decorator that when added on top of an existing function, looks at the inputs and outputs and write all relevant information into a HDF5 file. The only modifications that would be necessary in the domain science package would be:
    1. add a decorator on top of the function
    2. Accept keyword arguments for the HDF5 group path and a Writer object (extended by NSID or USID). This information need not be used in the functions itself at all When users provide both the group and Writer object, the decorator will go about writing the information to HDF5.
  2. A class implemented by pyUSID and pyNSID that does the same operations but could be called upon even when the desired function does not have the aforementioned decorator.

The problem with the first approach is that developers may not want to edit every function of theirs to add the kwargs and decorator. The problem with the second approach is that the end user would need to pass the arguments, keyword arguments, and outputs again to this writer class. Either way, someone will need to pay the price and there does not appear to be a simple enough solution.