phpDocumentor / fig-standards

Standards either proposed or approved by the Framework Interop Group
http://www.php-fig.org/
Other
360 stars 85 forks source link

Declaring Collections #47

Closed mvriel closed 5 years ago

mvriel commented 10 years ago

This is a followup on #6. I have chosen to open a new issue to prevent confusion

A regularly requested feature for the PHPDoc standard is the ability to describe the type of elements contained in an object representing a Collection (thus implementing the ArrayAccess or Traversable interface).

At current this is only possible by using a workaround and defining both an array specifier and object declaration in one @var tag:

@var \DateTime[]|DateTimeCollection My collection of DateTime objects.

With PSR-5 we want to introduce a construct that is inspired on usages in other languages and within some PHP projects:

@var DateTimeCollection<\DateTime> My collection of DateTime objects.

or, if you want to specify the keytype:

@var DateTimeCollection<integer,\DateTime> My collection of DateTime objects.

A type defined inside the angular braces can be of any type, including another Collection type:

@var DateTimeCollection<DateTimeCollection<\DateTime>> A collection of DateTime objects inside another Collection.

With this issue I would like to invite anyone to provide feedback, several other notations have been considered but these were unable to provide a concise notation where it is possible to use any type and provide a keyname (more information on request).

ashnazg commented 10 years ago

:+1: this approach looks very flexible to me, able to document actual Generics, and Traversables of all kinds.

stof commented 10 years ago

This approach is also compatible with the syntax of generics in Hack, so :+1: from me

stof commented 10 years ago

when I'm saying it is compatible, I mean that if you think about your collection as being a Hack generic Collection<T>, declaring a collection of datetime will look like Collection<\DateTime>

BenMorel commented 10 years ago

:+1: for me.

I would just add that we should allow arrays as well, not just traversable objects:

array<\DateTime>
schmittjoh commented 10 years ago

One problem with this approach is that you would implicitly assume that the type parameter, in your example DateTime, refers to the traversable type while in generics this can really appear anywhere on the API including simple return values from methods like $collection->get().

So, I think to make this complete, we would need another annotation on the Collection class which defines which parameters exist and where they go. This would then provide a solution not only for iteration, but all other methods too.

Here an example of a common case:


/** @var Collection<\DateTime> */
$collection = /** ... */

$x = $collection->get($k); // What type has $x here?
ashnazg commented 10 years ago

That's up to the @return tag on the get() method, isn't it? I haven't seen any discussion on this topic that touched on the Generics behavior regarding methods that return one of those <T> thingies.

coderbyheart commented 10 years ago

Will nesting be supported?

@var DateTimeCollection<integer, DateTimeCollection<integer,\DateTime>> My collection of DateTimeCollections of DateTime objects.
ashnazg commented 10 years ago

@coderbyheart I believe @mvriel's last example is effectively your example, where your key type of integer is explicit and in his the key type is implied as mixed.

BenMorel commented 10 years ago

@schmittjoh This issue has been created to avoid any confusion with real generics, confusion that has polluted #6. This proposal is just intended to provide documentation on what this specific instance of the Traversable contains when foreach()ing it, not to do any other kind of magic on the collection methods.

get() will only have the return type documented by @return I'm afraid, this proposal does not target it.

damonjones commented 10 years ago

:+1:

alias-mac commented 10 years ago

:+1:

mvriel commented 10 years ago

Thank you all for your feedback thus far, it is a sign that we are definitely on to something here. Keep it coming :)

dragoonis commented 10 years ago

As per our discussion at PHPNW13, less is more here, it needs to be simple, clear and not overly complicated.

@var DateTimeCollection<\DateTime> My collection of DateTime objects. this is clean and easy to understand.

@var DateTimeCollection<integer,\DateTime> My collection of DateTime objects. this is clean, the key is clear.

As for nesting, a collection of collections?, that's far too over-specific and we shouldn't be going down that route.

BenMorel commented 10 years ago

I agree with your view @dragoonis, apart from the nesting part. The nesting uses the exact same syntax, should not be more complicated to implement by static inspection tools, so why explicitly recommending to limit the syntax to a single level? It's a nonsense to me.

I have found myself in situations where I need an array containing arrays of objects, and I would be really happy to be able to declare:

array<integer, array<integer, object>>

It would be a big mistake IMO to prevent IDE vendors straight away from implementing deeper inspections based on this syntax, when they could.

dragoonis commented 10 years ago

@BenMorel your example is clear to me. Nested Collections was specifically what I was referring to looking overly complex.

I believe we're saying the same thing, and I'm happy to go ahead with nestng but simple nesting is what I'm protesting for! ;-)

BenMorel commented 10 years ago

@dragoonis Nice then, but just to understand, what do you call simple or complex nesting?

mvriel commented 10 years ago

I am also not sure what you are referring to @dragoonis? Do you think the suggested syntax or nesting is too complex or just right and you want to warn against making it more complex?

dragoonis commented 10 years ago

The syntax is fine, but nesting things isn't simple documentation. For years we've been happy with scalar @param stuff. We're adding basic multi-dimensional documenting here with key => val docmentation, but I'm not wanting multi-multi dimensional, it's just over the top.

mvriel commented 10 years ago

@dragoonis whether you use the nested stuff is imo up to the project. I personally would like to avoid it whenever I can but there is a need out there so I think it is good to offer the support :)

BenMorel commented 10 years ago

I agree with @mvriel:

Plus, I think it does not make sense to explicitly prevent nesting: if X is a type, Y is a type, and X<Y> is a type, why would you not be able to use X<X<Y>>?

sun commented 10 years ago

:+1: The proposal makes sense + makes PHP more approachable to developers of other languages.

I also agree with @BenMorel - if you don't like nested collections, simply don't use them.

The one and only job of PSR-5 is to state how you document things, IF you use them. It does not force you to use them.

theseer commented 10 years ago

Sorry for being late to the party, but I'm not sure if I fully understand the reasoning behind this addition.

While it makes perfect sense to me to have @var foo[] to document the variable as an array of the given type, I currently fail to see why such a documentation would be required for an actual object - regardless whether that object would implement a collection or not?

If the class referenced would implement ArrayAccess and the code using it would treat it as an array, it can be replaced by an array and any documentation should not refer to the object class but merely the fact it's an array. We do have documentation for that by the @var foo[] syntax.

If the class referenced would implement ArrayAccess and the code using it would use an actual API provided, the fact it implemented ArrayAccess is of no interest and thus shouldn't be any concern for the documentation.

Since Traversable cannot be implemented directly in PHP userland, this is of no interest. The default implementor for traversable would be an iterator, which again be technically be replaced by an array or specific API - thus the previous logic applies.

The only thing we're missing is a documentation about the key in an associative array/collection. While I'm actually not convinced we do need that, I guess that really might be considered a missing feature.

Regarding nesting: While of course technically and syntactically there wouldn't be any limits in the proposed format, I really wonder where that would lead to?

It's already an established anti pattern to use large nested array structures - let alone pass those around as parameters. Why would we honor this approach by providing a special annotation syntax for it? And, given we'd do so anyway, what would the code look like for complex structures? To me @var SomeCollection<string,SomeOtherCollection<string,SomeMoreNestedCollection<integer,ValueClass<integer,YetAnotherCollection<string,string>>>>> is everything but readable, let alone maintainable or of any actual help as the possible values for the keys aren't defined.

I do understand that there is, given people still use largely nested array structures, a need to document those structures. But this is not an issue a docblock annotation should care about.

If the parameter is a largely nested object structure though, the API return type hints should contain everything needed to make IDEs happily support autocomplete.

stof commented 10 years ago

@theseer If you return a collection of objects, the surrounding code does not need to only use it as an iterator. It could also rely on the fact that it is a collection to call other methods on it. But the fact that some consumers of your API rely on the fact that the return value is a collection does not mean that knowing which kind of value is returned when iterating is useless. for properties, you can indeed change it depending of the way you use it inside your code. For return value, you cannot as the code using it is in a different place (possibly out of your hands)

theseer commented 10 years ago

@stof Maybe it's the hot weather, but despite your explanation I still don't see the point. Let me try to elaborate by going through your explanation step by step.

If you return a collection of objects, the surrounding code does not need to only use it as an iterator.

The calling code can of course make use of the returned object as it sees fit. If the result object implements an Iterator, it can be used as an iterator. If it implements ArrayAccess, it can be used as if it would be an array. As soon as it (also) provides a specific API and the calling code requires this API to be present when using the result, the return value cannot be replaced anymore by a "generic" collection or a simple array. Either way, having a @return CollectionClass would suffice.

It could also rely on the fact that it is a collection to call other methods on it.

If you can call methods on it, it's not merely a generic Collection but an Object with a specific API. The fact it may also be a collection is of no use as only an object implementing the required API can be used. No matter whether this is a parameter or a return type.

But the fact that some consumers of your API rely on the fact that the return value is a collection does not mean that knowing which kind of value is returned when iterating is useless.

I didn't say that. And of course it's helpful if not required to know the return value when iterating. But that is a simply @return Value for the next() method of the iterator or getOffset() method of the ArrayAccess respectively. If an IDE does not resolve this properly, this should be considered a bug - or a feature request - as this is merely an implicit API call. We do not (imho) require a new documentation type for this.

for properties, you can indeed change it depending of the way you use it inside your code. For return value, you cannot as the code using it is in a different place (possibly out of your hands)

I'm not sure I understand what you try to tell me here. If you opt for multiple return types, I'd strictly vote against that.

The more I think about it, the more the whole idea of annotating collections as described seems flawed:

If the type of the parameter or return type changed from a basic array into a collection class but should it - for whatever reason - still be only used as an array, the existing documentation @var foo[] suffices and has to remain. No additional API can and will be provided. (At least when it comes to the API documentation. PHP of course doesn't really care here..)

If the type of the parameter or return type is a collection providing an API, well, we do have working documentation syntax for that.

if the type of the parameter or return type is a collection implementing ArrayAccess and/or Iterator, we do not need a new documentation either as the methods required by the interfaces can use standard @param and @return annotations.

So, sorry to say that, but I still fail to see the need for a new documentation syntax here. What am I missing?

stof commented 10 years ago

The calling code can of course make use of the returned object as it sees fit. If the result object implements an Iterator, it can be used as an iterator. If it implements ArrayAccess, it can be used as if it would be an array. As soon as it (also) provides a specific API and the calling code requires this API to be present when using the result, the return value cannot be replaced anymore by a "generic" collection or a simple array. Either way, having a @return CollectionClass would suffice.

Some of the surrounding code might rely on the specific collection API while other calling code just need the iterator. for place iterating over the collection, it is great to know what is inside the collection.

I didn't say that. And of course it's helpful if not required to know the return value when iterating. But that is a simply @return Value for the next() method of the iterator or getOffset() method of the ArrayAccess respectively. If an IDE does not resolve this properly, this should be considered a bug - or a feature request - as this is merely an implicit API call. We do not (imho) require a new documentation type for this.

This expects that you implement a new collection for each possible type inside them just to change the doc of the return value of current() (not next btw, which returns void in an iterator). this does not play well at all with generic collection implementations (for instance Doctrine collections), where the only thing we can document in the collection itself is @return mixed.

stof commented 10 years ago

another case where it does not work is if you collection implements the iteration through IteratorAggregate and returns a builtin ArrayIterator from it for instance

theseer commented 10 years ago

Some of the surrounding code might rely on the specific collection API while other calling code just need the iterator. for place iterating over the collection, it is great to know what is inside the collection.

Maybe I'm blind, but this doesn't make any sense to me: If i require a certain API to be present, the returned object has to provide it hence the API Documentation needs to reflect that. I cannot have a mixed construct here. To be able to use the returned object as iterator / array, the respective interfaces need to be implemented by said type. That is already (implicitly) documented by stating the method will return an object of that type.

I completely agree that when iterating over the collection one wants to know of what type the element is. But as said before, this is an implicit API call. So it's not a special use case, you just don't see the API call in the code.

This expects that you implement a new collection for each possible type inside them just to change the doc of the return value of current() (not next btw, which returns void in an iterator).

Oups, current() of course, not next(). Why did I write next()...?

And yes, of course that would assume said thing. Everything else doesn't make any logical sense as otherwise it's as bad as a classic array which may or may not be of the structure documented. Of course having @return foo[] doesn't ensure this either - so yet another reason not to use arrays as parameters or return types.

this does not play well at all with generic collection implementations (for instance Doctrine collections), where the only thing we can document in the collection itself is @return mixed.

Not very surprising that a generic implementation cannot provide a specific API ;-) And maybe that shows that generic implementations are a bad thing if they are concrete classes and not merely abstracts.

Regarding iteratorAggregate: That is also a generic implementation, which imho should have never been a concrete but rather an abstract class.

I do understand those cases technically do exist. What I don't agree upon (yet?) is that we should support them with a dedicated syntax. As mentioned before, PHP as of now does not have a code level construct that enforces an array of objects of a specific type. One reason why people moved away from unspecific arrays and established concrete classes with a dedicated API. Why do we now try to take a step back into the past and have generic collections that are basically only syntactic oop sugar on an array? Didn't we just learn that arrays suck are bad?

Of course this is a question beyond this ticket and can probably be a lengthy discussion elsewhere. Question remains, do we want to support this with a dedicated syntax? Maybe, but it feels broken to me.

sun commented 10 years ago

@theseer The job of a PSR-5 phpDoc standard is to give clear instructions for how you should document your code IF you are writing certain code.

PSR-5 does not and should not contain any kind of evaluation/rating of "good code" vs. "bad code". That's none of its business. It's one and only business is to tell you how you can document your code, if you choose to document it.

If you and your consumers are happy with a simple @return array, @return Foo[], or @param mixed, just simply use that.

But if you want to document a particular case more precisely, then you should have options.

In short, whether something looks wrong or ugly is irrelevant. If you don't like it, don't write such code; your choice. The only relevant question is whether the proposed syntax for documenting collections makes sense and can be added to PSR-5, so that developers have an option to document their code if they are confronted with that use-case.

theseer commented 10 years ago

@sun I do realize this is not about good or bad - or the right and the wrong way of doing things. I merely - even if writing long texts about it ;) - try to point out that we are discussing a solution to a problem that doesn't exist. (Okay, that statement is obviously heavily biased by my very own opinion.)

Funny thing though is that the proposed addition does not cover classic nested arrays structures which may be used just as good (or badly ;) ) in favor of iterators and generics.

If we are to come up with a syntax for documenting generic data structures (despite there is no code level construct ensuring that it's actually valid and despite the fact i consider it conceptually broken), (nested) arrays should be covered by it as well.

BenMorel commented 10 years ago

Not very surprising that a generic implementation cannot provide a specific API ;-) And maybe that shows that generic implementations are a bad thing if they are concrete classes and not merely abstracts.

The thing is, there is no viable alternative to generic collection implementations in PHP. As generics do not exist, your only alternative would be to create a UserCollection class for User objects, a CarCollection for Car objects, and so forth. Really cumbersome just to redeclare the current() return value.

The whole purpose of this PSR is to be able to declare Collection<User>, which can be then properly understood as a Collection as well as an iterator of User objects:

if (! $collection->isEmpty()) { // $collection instanceof Collection
    foreach ($collection as $user) { // $user instanceof User
        // ...
    }
}

Funny thing though is that the proposed addition does not cover classic nested arrays structures which may be used just as good (or badly ;) ) in favor of iterators and generics.

I suggested above that this PSR covers arrays as well:

array<string>
array<int, string>
array<int, array<int, string>>

Regarding nesting: While of course technically and syntactically there wouldn't be any limits in the proposed format, I really wonder where that would lead to?

I'll repeat what I said above:

If X is a type, Y is a type, and X<Y> is a type, why would you not be able to use X<X<Y>>?

Again, if you don't like it, don't use it. But I can't see why we would explicitly prevent the standard from supporting nesting.

theseer commented 10 years ago

The thing is, there is no viable alternative to generic collection implementations in PHP. As generics do not exist, your only alternative would be to create a UserCollection class for User objects, a CarCollection for Car objects, and so forth.

Yes, I'm afraid that'll be correct.

Really cumbersome just to redeclare the current() return value.

Cumbersome? Yes. But the only way to technically enforce that only objects of the given type can be added to the collection. That is more than just redeclaring current(). Merely documenting that a certain type is expected is no the same and makes a Collection as useless as an array.

Point taken though, that this would be about coding style rather than documenting code and thus may be out of scope for this PSR.

I suggested above that this PSR covers arrays as well

Sorry, I've missed that. Point taken as well.

If X is a type, Y is a type, and X is a type, why would you not be able to use X<X>? Again, if you don't like it, don't use it. But I can't see why we would explicitly prevent the standard from supporting nesting.

All I was try to say here is that nesting creates an unreadable mess when reaching multiple levels. And it's getting worse if funny things like namespaces come into the mix. That of course is not a "technical" limitation in the syntax.

If the PSR is to promote what is technically possible as well as parseable, then of course this is perfectly fine and we're done discussing. In case it should instead promote a best practice on how to document things apart from defining a syntax for it, readability and maintainability of said documentation should play its part. And imho, this syntax gets unreadable when nested construct are written as one sequence and on a single line. Maybe the solution already is to require it to span over multiple lines for nested structures? (Even though that would break with the general assumption that @-annotations are single lined)

Anyway: I've been asked via twitter to provide my thoughts. I've done so. As we're starting to run in circles, I don't think there's very much to be added from my part. I admit I have a bit of a hard time implementing solutions to my software to fix problems that only exist because of (to what i consider) impractical coding styles. So maybe I'm the wrong person to ask for advice to begin with in this context. May this be as it is: If this PSR is finalized and a syntax is agreed upon, phpDox is of course going to support it regardless of me liking it or not.

mvriel commented 10 years ago

Hi @theseer, welcome to the party!

And don't worry about entering the discussion now, I value your opinion and would like to invite you to please read through the PSR and provide your feedback where you can, even if it is criticism. The goal for this PSR is for everyone to be able to have something they are mostly happy about, then can agree with and at some point disagree with (can't please everybody, right? ;))

I think that most people here have made the case so I am not going to iterate on that; I would however like to elaborate on the following paragraph in the hope of providing clarity;

If the PSR is to promote what is technically possible as well as parseable, then of course this is perfectly fine and we're done discussing. In case it should instead promote a best practice on how to document things apart from defining a syntax for it, readability and maintainability of said documentation should play its part. And imho, this syntax gets unreadable when nested construct are written as one sequence and on a single line.

My vision has always been that this document is a technical specification of the Syntax and intentions of the PHPDoc Standard and intentionally not to dwell on Coding Standards and Annotations. This is because we can all argue on what a good coding standard can be, and that view can shift over time, but a technical description should be a solid foundation on top of which you can form your standards.

I agree with you that nesting the Collection syntax is to be discouraged and we can even add a admonition to the chapter on the Collection syntax stating that over-using it reduces readability. However limiting it would prevent the 1% use-cases where it would be useful and would even introduce another business rule while not limiting is from a technical perspective easier.

(The current Syntax is that a collection may be of any type and contain any type, so array<array<string>> would be legal; trying to limit what is inside the collection does not reflect the way the language works and would be an artificially introduced limit)

Maybe the solution already is to require it to span over multiple lines for nested structures? (Even though that would break with the general assumption that @-annotations are single lined)

phpDocumentor by default considers all tags (or @-annotations) to be multiline as each may have a multi-line description. This practice is also what the ABNF in this document reflects.

theseer commented 10 years ago

Hi @mvriel

Hi @theseer, welcome to the party!

Thanks :)

The goal for this PSR is for everyone to be able to have something they are mostly happy about, then can agree with and at some point disagree with (can't please everybody, right? ;))

Mission accomplished ;) - at least for my part.

However limiting it would prevent the 1% use-cases where it would be useful and would even introduce another business rule while not limiting is from a technical perspective easier.

I guess despite still not liking it for the already given reasons, I'm beyond trying to limit it but rather looking for a viable solution in terms of readability. As other PSR define formatting rules, I guess this PSR could do so as well for this purpose.

So If there can be a readable syntax / formatting rule for nested definitions there wouldn't even be a requirement to discourage them - at least not for that reason.

Maybe the solution already is to require it to span over multiple lines for nested structures? (Even though that would break with the general assumption that @-annotations are single lined)

phpDocumentor by default considers all tags (or @-annotations) to be multiline as each may have a multi-line description. This practice is also what the ABNF in this document reflects.

Sure, phpDox does so as well. I just recall some user of phpDox being confused about the rendered output as he assumed @-annotations to be single lined. And despite me implementing it differently, I actually had to check my code and the specs as well to verify i'm doing it correctly.

Anyhow, to summarize my current state of mind: I'm still not happy with the syntax in nested cases, I won't promote the use of this type of documention in general as I won't promote code constructs that would require said documentation - but I also don't see any technical issues implementing support for it in phpDox. Hence I'll be adding it to my docblock parsing code when this PSR is finalized.

boenrobot commented 10 years ago

So If there can be a readable syntax / formatting rule for nested definitions there wouldn't even be a requirement to discourage them - at least not for that reason.

OK, I have an idea about that...

If each of the components in the syntax are defined as separate tokens, perhaps we could allow for some whitespacing, which would surely increase readability, even in very complicated cases.

Consider

@return SomeCollection<string,SomeOtherCollection<string,SomeMoreNestedCollection<integer,ValueClass<integer,YetAnotherCollection<string,string>>>>> The results.

vs.

@return SomeCollection<
    string,
    SomeOtherCollection<
        string,
        SomeMoreNestedCollection<
            integer,
            ValueClass<
                integer,
                YetAnotherCollection<string,string>
            >
        >
    >
> The results.

Now... this will make it... a little... OK, a lot... more difficult for parsers, including phpDocumentor... which would need to detect the "<" in the first token, and switch to "muti-line type parsing" on that (which would in turn decrease performance of parsers), but it's certainly doable.

mvriel commented 10 years ago

I see no reason to prohibit a multiline statement; it is basically a necessity given that some projects limit their line lengths in accordance with PSR2

jacobsantos commented 10 years ago

For what it is worth:

/**
 * @return [\Type] Array of Types.
 */

This should tell you that it is an array and the items in the array are of Type class. I would rather like to be able to define names when doing below.

/**
 * @return {name => \Type, otherName => \AnotherType}
 * StdClass with name and otherName.
 */

I do understand that it is not proper to do so. However, most of the time when there is an arbitrary array or object type that I don't have time to create a proper class for, I document the object and array in the description. This has always seemed annoying and requires hacking the IDE to properly define the types.

One idea would be a custom phpDoc tag.

/**
 * @var MyCustomName {name => \Type, otherName => \AnotherType}
 * StdClass with name and otherName.
 * @return MyCustomName
 */

Or something less dumb.

Really, my two cents, for all it is worth for the discussion is I'd rather not have to do array<\Type> and would rather do [\Type] and {\Type} when the case might come up. I'll do it whatever way the IDE and phpDocumentor (or phpDoc application used) suggests.

Would be really nice to have array support finally.

mindplay-dk commented 9 years ago

@mvriel sorry to chime in on this subject yet again, but... you are making a huge mistake here.

The syntax you have chosen for collection types, using angle brackets, is identical to that of generics in other languages, and you even refer to it as "generics" in the spec - but, I have to point out again, what you're doing here is not generics, but merely collection-type hinting. The concept of "generics" involves type arguments, which are not part of this specification.

Not only is it misleading to use familiar generic syntax for this, it's also risky, as PHP may actually support generics in the future - have you considered what will happen, if they choose angle brackets for native PHP type arguments?

The concept of generics isn't a superset of the collection type arguments concept in your spec, and this will become a real, practical problem in the future.

I strongly suggest you use a distinct syntax from generics, and remove the word "generics" from the spec - the correct term for what you're doing is "collection type hint".

I would argue again that collection type hinting is a crippled form of generics, in which the type arguments are assumed to have something to do with key/value types in collections only, but offering no means of declaring that those type arguments exist - e.g. not providing any means of declaring the type arguments themselves, simply assuming that they exist and are global to every type.

I'm pretty sure you will have to support generics eventually, and I don't understand the resistance to go all the way with this and get it right the first time around :-(

jacobsantos commented 9 years ago

@mindplay-dk PHP is unlikely to ever support Generics for the same reason Python is unlikely to ever support Generics. Hack does support Generics and if phpDocumentor were to support it, then yeah, it does use '<' and '>'.

The reason it should use '[Name]' instead is because that is what PHP uses for array shorthand and phpDocumentor should be as close to PHP as possible to reduce the need for constantly looking up the documentation. Even 'array(CollectionName)' is better than '' or '{CollectionName}' given that no where does PHP use curly brackets or angle brackets for arrays (I'm defining '=>' as a single operator, where '>' is not separate from '=').

The angle brackets is foolish not only because of the common generic usage, but because PHP already defines a shorthand for arrays and going off of Python, you should define your language to be as simple to the programmer as possible.

Of course it would make perfect sense for phpDocumentor or phpdoc to use '' for arrays given that it isn't a well designed language and "fuck it" is part of PHP's history.

mvriel commented 9 years ago

@mindplay-dk

Communication tip: you are making a huge mistake here. would be better received if it were I believe you are making a mistake here.. I believe there to be no factuality in your statement, just an expression of your opinion. Also: I don't think the use of superlatives adds to your statement. I know, or at the very least hope that, you intend such nuance.

It does not matter which syntax I choose, there will always be someone who will consider that a mistake. When I look around at the current usage in other libraries, such as JMS Serializer, I see that when people try to invent a notation describe arrays in more detail that they tend to use the Generic-like notation.

I also believe that the people who are used to Generics will find the syntax familiar and intuitive. Having to learn a new notation can be counterproductive.

In addition to the above at least CoffeeScript (or CoffeeDoc) uses the same syntax and I have yet to find other examples in other generators.

Using array[integer] feels unintuitive to me as, in PHP, anything you put in between the square brackets refers to the key; as such I would interpret the line above as an array whose keys are integers.

Given the amount of support for this notation versus the opposition I currently see no reason to investigate an alternative.

I'm pretty sure you will have to support generics eventually, and I don't understand the resistance to go all the way with this and get it right the first time around

phpDocumentor nor PHPDoc does not need to go all the way with Generics; Generics is a runtime-affair that should be dealt with by the language itself. With PHPDoc we can make hint that a Collection has a specific signature (array<Integer> vs. array<T>) but that's it as far as I am concerned.

I also think that if we try to codify Generics in more detail that we explicitly run the risk of colliding with a future effort by PHP. I have looked at Hack's implementation when designing these rules (and added support for multiple types using the pipe operator as is the case anywhere else in PHPDoc).

BenMorel commented 9 years ago

But what if PHP eventually decides to follow Hack's path here?

Don't get me wrong, I also think that the <> notation is potentially better than the [] one for the reason you mention @mvriel , but considering that Hack is a reality today and that it may influence the future of PHP if its adoption continues to raise, I wouldn't bet that generics won't find their way into PHP, and on that day we would have a problem.

mvriel commented 9 years ago

@BenMorel Can you describe which problem you foresee? As far as I can see I have taken every precaution that this is a forwards-compatible change and once Generics are implemented following Hack's example that nothing should conflict.

It would seem you have a scenario in mind and I am quite interested to know which

mindplay-dk commented 9 years ago

Generics is a runtime-affair that should be dealt with by the language itself

Generics are not a run-time affair at all.

Generics is a means of describing relationships between types - it's not necessarily anything that has a run-time footprint at all; for example, TypeScript has full support for generics, and it's all statically checked at compile-time, with no footprint in the generated code at all.

This is possible (and makes sense) because JavaScript, to which TypeScript compiles, does not check or constrain type relationships at run-time. This is similar to PHP, where you can (for example) put whatever you want in an array or in a property - nothing gets checked, hence anything is possible, but only very rarely desired.

In other words, generic type relationships are already possible in PHP as it si in JS and many other dynamic languages that do not enforce type constraints at run-time. Moreover, type relationships exist in lots of PHP code, but with no way to document (or analyse, or check) them.

The reason I think this is necessary, is solely for the same reason that e.g. @var is necessary to document property types. Property types don't exist and aren't a run-time thing in PHP either, so this is no different - it makes sense only because it's interesting to document the types and perform design-time static analysis in an IDE or with external tools.

Note that, my suggesting an alternative syntax to describe collection types is a last resort on my part, since I can't get anywhere with convincing you of the need for real generics - I would strongly prefer to have real support for generics, with type arguments, as this would enable us to actually describe all of the type relationships in our existing codebases - not just those that happen to relate to collections.

In PHP, collections are factually not the only types that have relationships with other types - so what you propose is incomplete, as far as being able to document or describe existing type relationships in PHP code.

If you're happy with something half-baked, then so be it, nothing I can do to change that - I just hope you're completely aware of the trade-off you're making here.

At the very least, you should remove the word "generic" from the spec - what you have currently is merely "collection types".

I hope you're right about this being forwards compatible with real generics - I'm not sure.

stof commented 9 years ago

To give more content about this, Hack generics are also about static code analysis only AFAIK. The HHVM runtime does not know about them

mvriel commented 9 years ago

I am not closing the case on this one yet.

After thinking this through some more I can see one issue here, and that is that Generics need not be Iterable. This proposal does make that assumption as it will consider MyType<integer> an iterable construct with integer values; but when using true Generics this need not be true and could cause conflicts as it seems.

I will think this through some more and give a more complete response later

mindplay-dk commented 9 years ago

I will think this through some more and give a more complete response later

Thank You.

TypeScript is a great source of reference for this, as it is much the same problem with JS, and a very elegant solution - I highly recommend playing around with generics in the TypeScript Playground, where you can see the JS output.

Generic features and type-hinting in TS are in fact referred to as "type annotations" in their spec - although this is part of literal TS syntax, it is in fact just metadata annotating members and variables with type information that is used for design-time (and compile-time) checking only - as you can see in the JS output, none of it leaves any footprint, at all; it's just metadata, just like doc-blocks in PHP.

jacobsantos commented 9 years ago

I guess that technically Hack wouldn't need type metadata, if generics are used, given that the only reason to give this information is because typed information is missing and you are supplying it to the IDE and to other programmers.

@mvriel

Using array[integer] feels unintuitive to me as, in PHP, anything you put in between the square brackets refers to the key; as such I would interpret the line above as an array whose keys are integers.

Yeah, which is why I prefer to have @var [integer] or @var array(integer) or @var array(integer...).

phpDocumentor nor PHPDoc does not need to go all the way with Generics; Generics is a runtime-affair that should be dealt with by the language itself. With PHPDoc we can make hint that a Collection has a specific signature (array vs. array) but that's it as far as I am concerned.

  1. Technically, when you have Class<T>, that merely means that you have T type somewhere within the class somewhere. This could be in the constructor, it could be used with the methods. I suppose you are going off of languages that require you to specify the actual type at compile time verification.
  2. When you do ArrayIterator<String, Integer>, you are doing this for a verbose language that forces you to define something the language will already know.
  3. A language like Swift that does have generics does not require you to define the type as it can simply infer from the types you are using when initialized. If you violate the typed system the compiler will understand that at compile time. The only reason Java and other languages have this is arbitrary requirement placed by the language specification.
  4. That collections have Collection<Key, Value> is that it is better to let people know what each type annotation is going to be used for. The order is arbitrary as well. This is to say that it is entirely coincidence that arrays in PHP match the collection type order from some other language. It is entirely possible to have Collection<Value, Key>, then your house of cards crumbles.
  5. The documentation itself needs to specify exactly what each type is being used. That they are named <Key, Value> is unknown to the programmer unless they look at the class source. So the documentation itself needs to say, <First, Second>, that First type is for Key type and Second is Value type.
  6. You are setting generic order based on coding standards from another language, something new programmers to programming or from another language that doesn't have generics. You are saying, Generic type order matters, when the type order is arbitrary.

Therefore, saying array<String, Integer> is technically arbitrary, given that PHP doesn't have generics, there is no documentation for what the first type means and what the second type means. You have to have that on the phpDocumentor web site, which is entirely useless as someone might skip that section or not understand the purpose when looking at it for the first time.

  1. When Generics are introduced, it is more than likely that such type metadata will be unnecessary, given that the typed information is already provided. You will still need to say what the purposes of each type is for.
  2. What does array<T> mean verses array<T, V>? Am I to assume since I have previous experience with PHP and other languages that array<T> would more than likely mean the Value type? What if I'm wrong?

This is the reason I am advocating this code sample.

/**
 * @typedef Priority Integer
 * @typedef KeyEnum String
 * Possible values are: Bad, Good, Decent, Okay
 * @typedef ChildrenType [KeyEnum: SomeClassName]
 * @var [Priority: ChildrenType]
 */
$exampleVariable;

The only thing you have to understand is what @typedef means and how it is used. However, given even that, it should still be possible to figure out the usage.

This example should be as close to how the programmer defines the array and be as close as possible to remove as much confusion as possible.

/**
 * @var [
 *              [
 *                  Integer => [
 *                      "Something" => Integer,
 *                      "Else"           => String,
 *                  ]
 *              ]
 *           ]
 */

Which should mean that each item is an array containing an integer index with an array with two indexes "Something" and "Else". With "Something" having a value type of Integer and "Else" having a value of String type.

Of course, the examples are contrived, but there are types when the array needed to have specific index names and value types and I eventually just went with an object to better document the requirements.


Of course, the furious part would be needing to follow the standard or having one or two exceptions to the standard and having to write a plugin to match the changes.

mindplay-dk commented 9 years ago

Therefore, saying array<String, Integer> is technically arbitrary, given that PHP doesn't have generics, there is no documentation for what the first type means and what the second type means. You have to have that on the phpDocumentor web site, which is entirely useless as someone might skip that section or not understand the purpose when looking at it for the first time.

This is a very valid point, which underlines what I was stating earlier:

collection type hinting is a crippled form of generics, in which the type arguments are assumed to have something to do with key/value types in collections only, but offering no means of declaring that those type arguments exist - e.g. not providing any means of declaring the type arguments themselves, simply assuming that they exist and are global to every type

The same problem affects IDE developers, at least in some small way - PhpStorm for one uses PHP stubs to declare documentation for functions/classes exported by C modules, and therefore I suspect their approach to solving this problem might actually be to add a proprietary annotation to define known type arguments to various functions and classes in their documentation stubs. It's either that or hard-code the type arguments for known types somehow, which, judging from the work they've done on other IDEs, doesn't seem like their style. I'm speculating of course, but it would be a natural way for them to address this problem, given how they've solved similar problems with missing metadata in the past.

Again, I think the type system and syntax used by TypeScript is an excellent point of reference, as they have solved similar problems with JS in a very elegant and de-facto "industry standard" way, e.g. closely resembling the type system features and syntax in e.g. Java, C#, Dart and others...

jacobsantos commented 9 years ago

@mindplay-dk Then it seems we are in agreement for that specific issue. It does seem that if you wanted to better annotate the types and properties of an array, you need to tie it directly to an object verses attempting to document an array structure using metadata.

I would rather it was inferred from array usage, but from my understanding, this would probably be more than most programmers are capable of implementing. The entire point of docblocks is to remove some of the need to infer based on code and document the code.

At least for WordPress, there were quite a few times, the array structure had to be documented and it would have been nice if there was an tag for doing that. I would have preferred another tag like @property or at least have @property work for array definitions.

The biggest problem is that I don't believe there are many tags that are dependent on other tags. There are inline and block tags and both using {@... } syntax. There aren't currently any tags that if defined after another tag adds a relationship to the previous tag. Or well, I guess @code and @endcode is something.

I think WordPress team might be attempting something like the above, but I believe to work in the phpDocumentor system, they would have had to keep to the same inline and block usage.

It does seem like the problem is a bit more difficult than simply using Generic syntax alone. It might be that the current is supported and plugins are created that introduce other ways of improving the array documentation. Depending on the adoption of the plugin, it could make it into the official standard.

The problem would be getting IDEs to support the improved docblock tag or implementing a docblock engine that allows for relationships between tags.

I guess you could also have:

/**
 * @typedef Name array
 * @array-index Type
 * @array-value Type
 * @endtypedef
 */

or:

/**
 * @typedef Fuzzy array
 * @array-index String
 *    * Weak
 *    * Moderate
 *    * Strong
 * @array-value Double
 * @endtypedef
 */

and for objects:

/**
 * @typedef Fuzzy object
 * @property String label
 *    * Weak
 *    * Moderate
 *    * Strong
 * @property Double fitness
 * Must be between -1 and 1.
 * @endtypedef
 */

I think I might make a new ticket for this idea to extend @typedef to be a bit more useful, if you think I should.

mindplay-dk commented 9 years ago

It does seem that if you wanted to better annotate the types and properties of an array, you need to tie it directly to an object verses attempting to document an array structure using metadata

Not necessarily - just think of array as a built-in type, just like any other class. In terms of generics, you can think of ArrayObject as a synonym for array.

Anyhow, before we address @typedef, let's think about the most basic feature: type arguments.

Referencing TypeScript for semantics and Closure compiler for the annotation-names, here's a simple example of what a generic entity data service and an implementation for a User entity might look like:

/**
 * @template TEntity
 */
interface IDataService
{
    /**
     * @return array<TEntity>
     */
    public function getAll();

    /**
     * @param TEntity $item
     */
    public function save($item);
}

/**
 * @template TEntity
 */
abstract class DataService
{
    // ...
}

class User
{
    /** @var int */
    public $id;

    /** @var string */
    public $email;
}

/**
 * @extends DataService<User>
 * @implements IDataService<User>
 */
class UserDataService extends DataService implements IDataService
{
    /**
     * @return array<User>
     */
    public function getAll()
    {
        // ...
    }

    /**
     * @param User $item
     */
    public function save(User $item) {
        // ...
    }
}

$service = new UserDataService();

foreach ($service->getAll() as $user) {
    // ...
    $service->save($user);
}

@template TEntity indicates that the IDataService interface and DataService class are generic types with a template type TEntity which can be used within the scope of those types only.

@extends and @implements in the UserDataService specify the type arguments for the generic class and interface types - note here that we need to repeat the full type name in these declarations, since otherwise the type arguments for the class and interface, respectively, would be ambiguous.

Here's an example of a generic function:

/**
 * @template T
 * @param T $a
 * @param T $b
 * @return T
 */
function add($a, $b)
{
    if (is_int($a) && is_int($b)) {
        return $a + $b;
    }
    if ($a instanceof Vector && $b instanceof Vector) {
        return new Vector($a->x + $b->x, $a->y + $b->y);
    }
    // ...
}

$number = add(123, 456);

$vector = add(new Vector(3,4), new Vector(5,6));

Here we declare a generic function with a template type T which is used within the scope of that function to specify that the return type is the same as the type of both arguments - in an IDE, the type of $number can be inferred as int, and the type of $vector can be inferred as Vector.

This is basic, but should get you started.

mvriel commented 9 years ago

Can we please keep this issue on-topic and avoid discussing Generics here? This issue discusses how to document which types of keys and values an Iterable collection class has.