nansencenter / DAPPER

Data Assimilation with Python: a Package for Experimental Research
https://nansencenter.github.io/DAPPER
MIT License
348 stars 122 forks source link

Discussion for a dapper.da_methods.__init__.py #19

Closed yumengch closed 3 years ago

yumengch commented 3 years ago

Hi Patrick

I really like the way that all new DA methods are using OOP style. And I understand that dataclasses provides some nice features. However, I feel the treatment is a bit complex to me. I understand this is an issue of personal preference, so I just want to share a bit a possible alternative for the implementation. I think the ups is that it introduces abstract class to ensure the existence of assimilate function and avoids the use of decorators. The down is that we might need to add some features by ourselves. How do you think about it?

import time
import abc
import numpy as np
from struct_tools import NicePrint

import dapper.stats

class da_method(abc.ABC, NicePrint):
    printopts = dict(
        excluded=NicePrint.printopts["excluded"]+["assimilator", "assimilate"]
    )

    def __init__(self, **kwargs):
        # Use -5 to avoid object,
        # abc.ABC, NicePrint and da_method class
       # A similar loop also appears in dataclasses code.
        for b in self.__class__.__mro__[-5::-1]:
            for key in b.__annotations__.keys():
                value = getattr(b, key, None)
                if key in kwargs.keys():
                    setattr(self, key, kwargs[key])
                elif value is not None:
                    setattr(self, key, value)
                else:
                    raise TypeError(f"{key} must have a value")

        self.assimilator = self.assimilate
        self.assimilate = self._assimilate

    def _assimilate(self, HMM, xx, yy, desc=None, **stat_kwargs):
        # Progressbar name
        pb_name_hook = self.__class__.__name__ if desc is None else desc # noqa
        # Init stats
        # self.stats = dapper.stats.Stats(self, HMM, xx, yy, **stat_kwargs)
        # Assimilate
        time_start = time.time()
        self.assimilator(HMM, xx, yy)
        # dapper.stats.register_stat(self.stats, "duration", time.time()-time_start)

    @abc.abstractmethod
    def assimilate(HMM, xx, yy):
        pass

    def __repr__(self):
        with np.printoptions(threshold=10, precision=3):
            return self.__class__.__name__+":"+super().__repr__()

    def __str__(self):
        with np.printoptions(threshold=10, precision=3):
            return self.__class__.__name__+":"+super().__str__()

class ens_method(da_method):
    """Declare default ensemble arguments."""
    infl: float        = 1.0
    rot: bool          = False
    fnoise_treatm: str = 'Stoch'

class EnKF(ens_method):
    """Declare default ensemble arguments."""
    upd_a: str
    N: int

    def assimilate(self, HMM, xx, yy):
        print("Running assimilation")
patnr commented 3 years ago
  • ensure the existence of assimilate function
  • avoid the use of decorators.

These are good points. I am open to this, but a little hesitant because it is so fundamental. Overall, it seems like a good idea though, so give it a go (including making the requisite changes everywhere, and running the tests) if you're confident.

Some notes:

Btw, what happened to:

yumengch commented 3 years ago

These are good points. I am open to this, but a little hesitant because it is so fundamental. Overall, it seems like a good idea though, so give it a go (including making the requisite changes everywhere, and running the tests) if you're confident.

Thanks for your supportive reply. I pushed some changes in my repo. I think it passes all pytest tests. See here. The version is a bit different from dataclasses. Dataclasses decorator generates __init__ , __repr__, __eq__ functions via a function generation routine. My implementation write all functions directly.

The implementation is a bit different from what I expected initially but it is relatively complete. To make it run (at least pass tests) in DAPPER, I also changes several lines in dapper.xp_launch.py. (See line 575, line 283 - 285, line 297 ).

I'm quite ignorant/scared about mor and the "magic" number 5. And can we be sure all the __annotations__ we come across are "_Fields"? Please document with comments (and maybe docstrings), including the workings of the implementation, pros/cons, and references

__mro__ stores the information of class inheritance and I agree using the magic number is not desirable. I replaced it here. The __annotations__ is exactly how dataclasses determine its fields. See here

I think the __eq__ method is needed.

See here. This should generate the same output as dataclasses.

__repr__ and __str__ should also be implemented, as NicePrint is too verbose for da_methods.

See here. This should generate the same output as dataclasses

patnr commented 3 years ago

Hi,

FYI I will be very busy this week. Will try to take a look at this on Wednesday or Thursday

patnr commented 3 years ago

I personally feel OOP may make some design cleaner.

Agreed.

But as of now, I am not sure if it's worth it, seeing as the this implementation is also quite lengthy, and necessarily re-implements things from dataclasses.

In the commit just pushed I raise a custom error message in case a class does not define assimilate(), which would provide basic abstract-class functionality.

The code has also been slightly simplified, and better documented.

Abstract DA method. If you see this, a dosctring for your method is recommended.

That's delightful :1st_place_medal:

I'll close the issue for now, but feel free to re-open.

patnr commented 2 years ago

Reasons to prefer abstract classes: