huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.89k stars 26.51k forks source link

Typing wrong and contradicting. #33643

Open PhilipMay opened 1 week ago

PhilipMay commented 1 week ago

System Info

not relevant here

Who can help?

@stevhliu

Information

Tasks

Reproduction

Typing on some places is wrong and contradicting.

Examples:

The float type here is wrong:

https://github.com/huggingface/transformers/blob/78b2929c0554b79e0489b451ce4ece14d265ead2/src/transformers/models/mixtral/modeling_mixtral.py#L127

It should be Union[torch.Tensor, int] instead.

As far as I know is that HF does not want to support MyPy - see https://github.com/huggingface/transformers/issues/17528#issuecomment-1145010337 - that is ok...

But nevertheless. Providing wrong type info is highly confusing and can cause problems in the code using transformers.

I also see contradicting type info. Example:

This line says gate_logits: torch.Tensor:

https://github.com/huggingface/transformers/blob/78b2929c0554b79e0489b451ce4ece14d265ead2/src/transformers/models/mixtral/modeling_mixtral.py#L126

but the docstring says Union[torch.Tensor, Tuple[torch.Tensor] - see:

https://github.com/huggingface/transformers/blob/78b2929c0554b79e0489b451ce4ece14d265ead2/src/transformers/models/mixtral/modeling_mixtral.py#L136

This is contradicting and confusing.

... and both is wrong. The correct type is: Union[torch.Tensor, Tuple[torch.Tensor, None].

I think there are many similar issues in the transformers lib. My suggestion is:

Remove the type info from all docstrings because it is redundant and needs additional afford to be maintained.

I would also suggest to use MyPy in some way to find (and fix) at least some type issues that can cause confusion.

Expected behavior

When type info is provided it should be correct and not contradicting the docstring.

PhilipMay commented 1 week ago

Also tagging @sgugger because we had this discussion on the linked issue.

stevhliu commented 1 week ago

Thanks for reporting!

I think it would be better to fix the type info in the docstring instead of removing it everywhere. wdyt @LysandreJik?

LysandreJik commented 1 week ago

Hey @PhilipMay! We're very open to having PR fixes for these kind of issues indeed; it can't be exact everywhere but that's something that we can live with imo

PhilipMay commented 1 week ago

We're very open to having PR fixes for these kind of issues indeed;

@LysandreJik and how exactly would you suggest to fix this? My suggestion would be to add proper typing when possible and then remove it from the docstring to avoid redundancies and contradictions. What do you think?

I am asking because @stevhliu suggests to "fix the type info in the docstring instead" (see above).

LysandreJik commented 2 hours ago

The two serve different purposes: the docstring is used in the documentation displayed on the website, whereas the exact type can be used by type checkers commonly found in IDEs

I don't think we want to remove one or the other, but identifying the types which are mismatching like you just did is worthwhile (and could be automated, if we identify it as a source of error worth automating)