openeduhub / metalookup

Provide metadata about domains w.r.t accessibility, licencing, adds, etc.
GNU General Public License v3.0
5 stars 0 forks source link

Separate Extractor Implementations into individual classes / instances. #83

Closed MRuecklCC closed 2 years ago

MRuecklCC commented 2 years ago

As discussed with @RMeissnerCC, there are a couple of ways to separate the Extractor implementations and hence make the code more modularized and extensible.

One promising approach would be to go with python Protocol that defines the extractor interface:

from typing import Protocol

class Extractor(Protocol):
    def __call__(self, site: WebsiteData) -> tuple[StarCase, Explanation]:
         """Extract the informtation from the site"""

However, Protocols do not allow generic return types. I.e. the following would not be possible:

from typing import Protocol, TypeVar

T=TypeVar("T")

class Extractor(Protocol[T]):
    def __call__(self, site: WebsiteData) -> tuple[T, StarCase, Explanation]:
         """Extract the informtation from the site"""

This means, if we want the ability to return arbitrary extra result data from the different extractors, we would have to go with abstract base classes:

import abc
from typing import Generic, TypeVar

T=TypeVar("T")

class Extractor(Generic[T]):
   @abc.abstractmethod
    def __call__(self, site: WebsiteData) -> tuple[T, StarCase, Explanation]:
         """
         Extract the informtation from the site.
         Returns:
            - extractor specific extra information of type T
            - The star rating
            - An explanation (or list thereof)
         """

Or alternatively pass the extra data as Any or object or pydantic.BaseModel.