neuro-galaxy / torch_brain

3 stars 0 forks source link

A base-class for tokenizers could be useful #29

Open vinamarora8 opened 4 hours ago

vinamarora8 commented 4 hours ago

Having a base-class for tokenizers could be useful. It would be like this:

class TokenizerBase:
    def __call__(self, data: Data) -> Dict:
        raise NotImplementedError

All tokenizers will inherit this. It only defines an interface for tokenizers and would help the LSP understand the code better.

vinamarora8 commented 4 hours ago

If we're at it, we could also define a base-class for transforms

class TransformBase:
    def __call__(self, data: Data) -> Data:
        raise NotImplementedError
vinamarora8 commented 4 hours ago

We could also just define types like

TokenizerType = Callable[[Data], Dict]
TransformType = Callable[[Data], Data]