huggingface / doc-builder

The package used to build the documentation of our Hugging Face repos
Apache License 2.0
92 stars 35 forks source link

Allow overriding docstrings of superclass methods #458

Closed tomaarsen closed 1 year ago

tomaarsen commented 1 year ago

Hello!

Pull Request overview

Details

This is a draft PR (hence no tests, etc.) to discuss the idea of some new functionality: overriding docstrings without overriding methods.

Why is this useful?

It is quite common for functionality to be directly taken from a superclass, e.g. ModelHubMixin.from_pretrained or TrainingArguments. In some cases, you'll want to adopt the functionality without overriding the method, but you do want to update the docstring.

To my knowledge, this isn't possible in Python. All documentation builders that I know don't actually initialize a class before reading the docstrings from its methods, and the __doc__ attribute can't be modified outside of the class, so the docstrings can't be updated on the fly. E.g.:

class Foo:
    @classmethod
    def from_pretrained(cls):
        """This is the original docstring"""
        return cls()

class Bar(Foo):
    pass

Bar.from_pretrained.__doc__ = "This is the overridden docstring"
print(Bar.from_pretrained.__doc__)
Traceback (most recent call last):
  File "[sic]\setfit\test_doc.py", line 11, in <module>
    Bar.from_pretrained.__doc__ = "This is the overridden docstring"
AttributeError: attribute '__doc__' of 'method' objects is not writable

The proposal

It is possible to add a new argument to a method, e.g. __overridden_doc__, and set it to whatever you wish. Consider the following scenario:

...

class SetFitModel(ModelHubMixin):
    ...

Which becomes: image

And I want to update SetFitModel.from_pretrained by replacing the model_kwargs with the actual model keyword arguments, then I can add:

docstring = SetFitModel.from_pretrained.__doc__
try:
    docstring = docstring[:docstring.index("model_kwargs")]
except:
    pass
else:
    docstring += """multi_target_strategy (`str`, *optional*):
                The strategy to use with multi-label classification. One of "one-vs-rest", "multi-output",
                    or "classifier-chain".
            use_differentiable_head (`bool`, *optional*):
                Whether to load SetFit using a differentiable (i.e., Torch) head instead of Logistic Regression.
            normalize_embeddings (`bool`, *optional*):
                Whether to apply normalization on the embeddings produced by the Sentence Transformer body.
            device (`Union[torch.device, str]`, *optional*):
                The device on which to load the SetFit model, e.g. `"cuda:0"`, `"mps"` or `torch.device("cuda")`."""
    SetFitModel.from_pretrained.__dict__["__overridden_doc__"] = docstring

at the bottom of that same file.

When compiled, this results in: image

This is exactly what I'm after.


This PR is just a draft to showcase an idea. Note that this would be "advanced behaviour" that only few people would need to use, and everyone else can just ignore it. I'm open to different names than __overridden_doc__ as well (e.g. __preferred_doc__, or perhaps leave the __ out as that's normally for core Python functionality only)

cc: @lewtun would also benefit from this change

mishig25 commented 1 year ago

Is this not desirebale? Overwrite superclass method to redefine the docstring

class Animal:
    def speak(self):
        """This method allows the animal to speak."""
        print("Some generic animal sound")

class Dog(Animal):
    def speak(self):
        """Override: This method allows the dog to bark, using the Animal's speak implementation."""
        super().speak()  # Calls the speak method from Animal class
mishig25 commented 1 year ago

Also, there are a lot of docstring decorators in transformers. Have you checked if any of them satisfies your need:

https://github.com/huggingface/transformers/blob/cad1b1192b3ad98808d24f898e28fb56f78720d3/src/transformers/models/beit/modeling_flax_beit.py#L825-L830

https://github.com/huggingface/transformers/blob/cad1b1192b3ad98808d24f898e28fb56f78720d3/src/transformers/models/beit/modeling_flax_beit.py#L858-L861

tomaarsen commented 1 year ago

Is this not desirebale? Overwrite superclass method to redefine the docstring

class Animal:
    def speak(self):
        """This method allows the animal to speak."""
        print("Some generic animal sound")

class Dog(Animal):
    def speak(self):
        """Override: This method allows the dog to bark, using the Animal's speak implementation."""
        super().speak()  # Calls the speak method from Animal class

It indeed sometimes is not desirable. Given this original method from the superclass:

# in e.g. huggingface_hub
class ModelMixin:
    @classmethod
    def from_pretrained(
        cls: Type[T],
        pretrained_model_name_or_path: Union[str, Path],
        *,
        force_download: bool = False,
        resume_download: bool = False,
        proxies: Optional[Dict] = None,
        token: Optional[Union[str, bool]] = None,
        cache_dir: Optional[Union[str, Path]] = None,
        local_files_only: bool = False,
        revision: Optional[str] = None,
        **model_kwargs,
    ) -> T:
        """Original Docstring"""
        ...

Then one approach involving overriding is this:

class MyModel(ModelMixin):
    @classmethod
    def from_pretrained(cls, *args, **kwargs):
        return super(MyModel, cls).from_pretrained(*args, **kwargs)

Another approach is more clear about all of the arguments used in the subclass directly, to make the IDE useful again:

class MyModel(ModelMixin):
    @classmethod
    def from_pretrained(
        cls,
        pretrained_model_name_or_path: Union[str, Path],
        *,
        force_download: bool = False,
        resume_download: bool = False,
        proxies: Optional[Dict] = None,
        token: Optional[Union[str, bool]] = None,
        cache_dir: Optional[Union[str, Path]] = None,
        local_files_only: bool = False,
        revision: Optional[str] = None,
        **model_kwargs,
    ):
        """Overridden Docstring"""
        return super(MyModel, cls).from_pretrained(
            pretrained_model_name_or_path=pretrained_model_name_or_path,
            force_download=force_download,
            resume_download=resume_download,
            proxies=proxies,
            token=token,
            cache_dir=cache_dir,
            local_files_only=local_files_only,
            revision=revision,
            model_kwargs=model_kwargs,
        )

So, both approaches for overriding have sizable downsides that I want to avoid.

Those functions that you linked are interesting however, I had not seen those before. They seem to do some interesting tricks to try and override the __doc__. I'll look into it some more. Thank you!

tomaarsen commented 1 year ago

Wow, it works! I can override doc that way!

from transformers.utils import copy_func

# To update the docstring, we need to copy the method
SetFitModel.from_pretrained = copy_func(SetFitModel.from_pretrained)
SetFitModel.from_pretrained.__doc__ = "New docstring!"

cc @lewtun

Thanks a ton @mishig25! I'll close this :)

mishig25 commented 1 year ago

Please let me know if it makes sense to move this util methods into doc-builder rather than transformers. Or you can just coy the method inside a util file in your lib