huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.51k stars 27.12k forks source link

Typing wrong and contradicting. #33643

Open PhilipMay opened 2 months ago

PhilipMay commented 2 months ago

System Info

not relevant here

Who can help?

@stevhliu

Information

Tasks

Reproduction

Typing on some places is wrong and contradicting.

Examples:

The float type here is wrong:

https://github.com/huggingface/transformers/blob/78b2929c0554b79e0489b451ce4ece14d265ead2/src/transformers/models/mixtral/modeling_mixtral.py#L127

It should be Union[torch.Tensor, int] instead.

As far as I know is that HF does not want to support MyPy - see https://github.com/huggingface/transformers/issues/17528#issuecomment-1145010337 - that is ok...

But nevertheless. Providing wrong type info is highly confusing and can cause problems in the code using transformers.

I also see contradicting type info. Example:

This line says gate_logits: torch.Tensor:

https://github.com/huggingface/transformers/blob/78b2929c0554b79e0489b451ce4ece14d265ead2/src/transformers/models/mixtral/modeling_mixtral.py#L126

but the docstring says Union[torch.Tensor, Tuple[torch.Tensor] - see:

https://github.com/huggingface/transformers/blob/78b2929c0554b79e0489b451ce4ece14d265ead2/src/transformers/models/mixtral/modeling_mixtral.py#L136

This is contradicting and confusing.

... and both is wrong. The correct type is: Union[torch.Tensor, Tuple[torch.Tensor, None].

I think there are many similar issues in the transformers lib. My suggestion is:

Remove the type info from all docstrings because it is redundant and needs additional afford to be maintained.

I would also suggest to use MyPy in some way to find (and fix) at least some type issues that can cause confusion.

Expected behavior

When type info is provided it should be correct and not contradicting the docstring.

PhilipMay commented 2 months ago

Also tagging @sgugger because we had this discussion on the linked issue.

stevhliu commented 2 months ago

Thanks for reporting!

I think it would be better to fix the type info in the docstring instead of removing it everywhere. wdyt @LysandreJik?

LysandreJik commented 2 months ago

Hey @PhilipMay! We're very open to having PR fixes for these kind of issues indeed; it can't be exact everywhere but that's something that we can live with imo

PhilipMay commented 2 months ago

We're very open to having PR fixes for these kind of issues indeed;

@LysandreJik and how exactly would you suggest to fix this? My suggestion would be to add proper typing when possible and then remove it from the docstring to avoid redundancies and contradictions. What do you think?

I am asking because @stevhliu suggests to "fix the type info in the docstring instead" (see above).

LysandreJik commented 1 month ago

The two serve different purposes: the docstring is used in the documentation displayed on the website, whereas the exact type can be used by type checkers commonly found in IDEs

I don't think we want to remove one or the other, but identifying the types which are mismatching like you just did is worthwhile (and could be automated, if we identify it as a source of error worth automating)

github-actions[bot] commented 4 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

PhilipMay commented 4 weeks ago

the docstring is used in the documentation displayed on the website, whereas the exact type can be used by type checkers commonly found in IDEs

@LysandreJik Well, the exact type is also displayed on the documentation pages. Even if you do not specify and type in the docstring. This exactly creates the redundancy I would suggest to remove.

I don't think we want to remove one or the other,

Well, I would suggest to remove the type from the docstrings because the proper Python typing is also displayed on the doc pages.

LysandreJik commented 4 weeks ago

cc @ArthurZucker as you're reworking the docstrings right now, WDYT?

ArthurZucker commented 4 weeks ago

Yeah, sounds good, I am relying on the typing anyways! See #33771

github-actions[bot] commented 3 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.