phpDocumentor / fig-standards

Standards either proposed or approved by the Framework Interop Group
http://www.php-fig.org/
Other
360 stars 85 forks source link

Optional type-hints for PHP 7 code #119

Closed mindplay-dk closed 8 years ago

mindplay-dk commented 8 years ago

With PHP 7 in the water, I'd like to discuss the need for optional type-hints for @param and @return, the latter of which leaves me somewhat dumbfounded.

To demonstrate my issue, here's a PHP 5 example:

/**
 * @param string $stuff the string to measure
 *
 * @return int the length of the string
 */
function string_length($stuff) 
{
    return mb_strlen($stuff);
}

And now here's the PHP 7 version of that, with static type-hints on the parameter and return-type:

/**
 * @param string $stuff the string to measure
 *
 * @return int the length of the string
 */
function string_length(string $stuff) : int
{
    return mb_strlen($stuff);
}

Now, with PHP 7 supporting real type-hints, I'd like to stop duplicating parameter and return-types everywhere - in other words, I'd like to write:

/**
 * @param $stuff the string to measure
 *
 * @return the length of the string
 */
function string_length(string $stuff) : int
{
    return mb_strlen($stuff);
}

I know that @mvriel is going to disagree with me on this, but there is no reason I should need to duplicate type-hints everywhere - it doesn't provide any benefits, only an extra source of error, and a need to handle weird cases, where the doc-block type-hint doesn't match the static type-hint, etc.

Doc-blocks are not contracts - if I wanted a contract, I would use an interface, not a doc-block. Doc-blocks are documentation, and I shouldn't need to re-document self-documenting code.

Now, for @param there is no issue - we can simply make the type-hint optional, since the $ symbol will indicate clearly where the variable-name starts; there is no parser issue.

There is an issue with Php Storm though, which already accepts @param with variable-name and type-hint in backwards order - that is, in the above example, it thinks the type-hint is the. (They would need to fix that - and developers, in turn, would need to fix their backwards tags...)

The @return example I posted doesn't make sense at all - it wouldn't work, since it would be correctly interpreted with the as the type-hint. This is where I'd like your ideas. Here's two ideas for discussion:

The latter has the advantage of being backwards-compatible with PHP 5. One could even argue in favor of using it in @param as well, e.g. @param * $stuff, to make the assertion explicit, not sure...

Anyways, please post ideas/thoughts.

Please refrain from simply shooting down this idea from the hip - with the addition of scalar type-hints and return type-hints, PHP 7 is finally taking another step towards being a real gradually-typed language, and we need to accommodate that. If you'd like to personally keep duplicating all your type-hints, that's your provocative, but I'd like to use doc-blocks to document my code without also making double assertions - PHP 7 gives us new options, and we, as developers, should have options too.

mvriel commented 8 years ago

@mindplay-dk Please don't assume I will shoot down an idea or you will force me to shoot it down as a matter of principle ;)

I can definitely see the benefit of consuming the typehints provided by PHP and actually; I like your idea about the placeholder referring to the typehint in the signature.

Omitting the Type in both '@param' and '@return' is not an option since you correctly pointed out that there is no way to know whether the first word is a Type or the first word of the description (even with @param since IIRC the variable was made optional in PSR-5).

I am however unsure of the asterisk as operator since that generally means a wildcard and I can imagine it might be confused for a shorthand for 'mixed'.

My direct thoughts would go to a hypen (-) since that generally means 'omitted' but more input is more than welcome.

jaapio commented 8 years ago

I like the idea of making the type optional in the docblock. However It should still be possible to specify a type since in some cases a type hint isn't possible since php7 doesn't allow a return type hinted method to return NULL :-(

If we could give the docblock parser the value object of the method it is handling we should be able to determin whether the first word is a type or part of the description. There is no need for a placeholder. However using a placeholder will make things clearer of the developers. And being backwards compatible is a great thing. I have read somewhere that ZF3 will have generated type hints for example to be compatible with php 5.5 and to allow php 7 optimizations using type hints for example.

PRS-5 already contains a "standard" for replaced parts in the docblocs. {@inhereddoc) I think we could apply this to on other parts of the docblock.

/**
 * @param {@type} $stuff the string to measure
 *
 * @return {@type} the length of the string
 */
function string_length(string $stuff) : int
{
    return mb_strlen($stuff);
}

The {@type} might be a bit to long. and can be replaced by some other tag. But I like the idea of having the {@someTag} as a placeholder for variables that are available somewhere in the code.

jaapio commented 8 years ago

Another idea that came up when reading through the issues on this project. Earlier added by @mindplay-dk, why not using the notation for this. See #81 add support for generics

mvriel commented 8 years ago

If we could give the docblock parser the value object of the method it is handling we should be able to determine whether the first word is a type or part of the description.

The above is a bit technical, do you mean to say the following?

If a method's argument has a typehint then the Type of an @param MUST be omitted or MUST be equal to the typehint of the method's argument

Because it is not possible to determine whether the first word is a Type based on whether the first word matches the name of a Type in the same project because sometimes you use Types that are not native to this project (such as SimpleXMLElement).

PSR-5 already contains a "standard" for replaced parts in the docblocs. {@inhereddoc) I think we could apply this to on other parts of the docblock.

The use of inline tags, tags surrounded by braces, is syntactically part of the Description. So the notation @param {@type} $stuff the string to measure actually means I have a parameter immediately followed by a description, then a variable name and then another description; unless we specify in PSR-5 that using {} on that location is some sort of template but I think that is confusing ..

jaapio commented 8 years ago

Sorry about the technical part. I meant to say that a parser could determine whether the first word is a type. Since it could know if a type hint is provided in the method header. In phpdocumentor we are actually parsing the method header. So we know if there are type hints provided. If so, the first word of the @param or @return is part of the description.

I think the braces {} as templates are a good thing to add to PSR-5. For me it is less confusing that some magic character in the docblock that is replaced by the type. It will even allow us to add more templateing to help developers completing there docblocks.

mvriel commented 8 years ago

Omitting the Type would only work if PSR-5 dictates that the author of the documentation omits the Type if a typehint is present (which breaks backwards compatibility with existing implementations) or that the Type matches the typehint (including defaulted null).

Any other situation where a typehint is provided would result in an unpredictable state because you cannot accurately determine this.

This would result in

/**
 * @param $stuff the string to measure
 *
 * @return the length of the string
 */
function string_length(string $stuff) : int
{
    return mb_strlen($stuff);
}

or even

/**
 * @param the string to measure
 *
 * @return the length of the string
 */
function string_length(string $stuff) : int
{
    return mb_strlen($stuff);
}

This does also mean that you cannot provide a different Type with the @param or @return as the reader would then consider it to be part of the description.

/**
 * @param String the string to measure
 *
 * @return the length of the string
 */
function string_length(StringInterface $stuff) : int
{
    return mb_strlen($stuff);
}

In the example above the tag would be interpreted as having the description String the string to measure because the reader cannot infer that the String part actually reflects a concrete instance of the StringInterface (a human reader might be able to guess that that might be the case here but for a non-human reader this would be difficult).

mindplay-dk commented 8 years ago

Please don't assume I will shoot down an idea or you will force me to shoot it down as a matter of principle

@mvriel I'm sorry, I didn't mean to put words in your mouth, but you have been a strong proponent of making double assertions in our past discussions, so I was kind of expecting your position would be the repetition is a good thing - I'm glad you don't see it that way ;-)

As for @param, I think the type can be optional without problems, but the variable reference should be required, as it is now, since otherwise there is no direct link between the argument and it's description - only an inferred link due to the order of @param tags and arguments, which is risky, since many common refactorings (such as adding/removing or changing the order of arguments) could easily lead to mis-association of arguments and comments, which would be very bad.

The @return tag is different, since there's only one of those in the same context. But yeah, the issue remains that there's no way to tell if @return string length means it returns a string or it returns the "string length" - neither a machine or person can safely make that assertion.

I think, if we support something like @return - string length to infer the type from the static type-hint, we probably should allow the same for @param, even if it can be optional here, just for the sake of consistency; I bet some people will get used to doing it on @return and will intuitively add them to @param as well, and I think it's good to explicitly indicate you want to infer the @param type from the static type-hint, since there's also a chance (and already you see this in various codebases) that somebody might simply forget to add the type - putting a - or some other character there asserts to the reader (and parser) that that's what you want.

Omitting the Type would only work if PSR-5 dictates that the author of the documentation omits the Type if a typehint is present (which breaks backwards compatibility with existing implementations) or that the Type matches the typehint (including defaulted null).

I don't think that's safe. Varying syntax, dependent on whether or not a static type-hint is present, is quite brittle, since refactoring could easily mean that this type-hint gets removed or (more likely) added.

In general, syntax dependent on specific context or circumstances is risky, IMO.

mvriel commented 8 years ago

but the variable reference should be required, as it is now, since otherwise there is no direct link between the argument and it's description

I agree that it should but it became optional as a concession in other discussions; I will personally never omit it because I prefer consistency.

I don't think that's safe. Varying syntax, dependent on whether or not a static type-hint is present, is quite brittle, since refactoring could easily mean that this type-hint gets removed or (more likely) added.

In general, syntax dependent on specific context or circumstances is risky, IMO.

I cannot agree more; same reason why I am not a proponent of omitting the property name or Type in an @param tag; it makes it dependent on the context (being the actual method).

As far as I am concerned the biggest discussion now is how to indicate that a Type is to be taken from the method signature.

The following variations have been covered here:

  1. @param * $string Contains the string length to determine the length of
  2. @param - $string Contains the string length to determine the length of
  3. @param {inherited} $string Contains the string to determine the length of
  4. @param <T> $string Contains the string to determine the length of

When I put these together it feels to me that the * and - are lightweight but as such hard to miss when reading tags; which is both a benefit and a downside as it also means it is unobtrusive.

Using braces would be confusing for me as braces are only used for inline tags in a Description and it would give me the impression that a Description is started directly after the tag name. Also: if the description contains an inline tag then that will double confuse me :)

Using chevrons (<>) is something I would prefer to avoid since that would or could conflict with a future generics implementation.

mvriel commented 8 years ago

I will be drafting an e-mail to the PHP-FIG mailing list with this proposal; let's discuss there further. We are in the process of moving all discussions to that location so I will close this issue for further comment