php / php-tasks

Tasks that need doing. This is for php-src maintainers. The end-user bug tracker is at https://bugs.php.net/.
27 stars 6 forks source link

Check existing parameter naming for consistency, clarity and possible brevity #16

Closed bwoebi closed 3 years ago

bwoebi commented 4 years ago

General issue to track naming suggestions:

Let me start with string functions...

General name conventions: context current name suggestion reason
* $string $str $str is a very common abbreviation, and effectively just repeating the typehint. Writing $string over $str as parameter name has no intrinsic clarity gain. Also e.g. strtr() uses $str, while str_split uses $string.
string functions $data $str $data suggests, without reading the source signature, that something may be not just a string.
* $is_ prefix $ Drop is_ prefixes for booleans, they are anyway given by the type; is_something: true in a named function call does not help over something: true
comparing functions $frist, $second $str1, $str2 E.g. similar_text, keep it consistent with strcmp(). (for non-string types, obviously not a "str" prefix)
function current parameters suggestion reason
str_i?replace $search, $replace, $subject, $count $from, $to, $subject is $search what shall be serached for or in? Also mirroing strtr() then
strc?spn $subject, $mask, $start, $length $str, $mask, $start, $length $subject does not provide any clarity over generic short $str
addcslashes $str, $charlist $str, $mask consistency with $mask from strspn
chunk_split $body, $chunklen, $end $str, $chunklen, $end The notion of $body is derived from the usage in mails, but the function is far more generic than that
[lr]?trim $str, $character_mask $str, $mask consistency with $mask from strspn
(sha1|md5)(_file) $filename/$str, $raw_output $filename/$str, $binary what is "raw" even supposed to mean? We want binary, i.e. not a hex representation
str_word_count $string, $format, $charlist $str, $format, $mask consistency with $mask from strspn
strpbrk $haystack, $char_list $haystack, $mask consistency with $mask from strspn
substr_compare $main_str, $str, $offset, $length, $case_insensitivity ??, ??, $offset, $length, $case_insensitivity Not sure, but $main_str and $str is weird - maybe needle and haystack?
nikic commented 4 years ago

See also https://gist.github.com/nikic/b1223525b8bb0a1bef0f6b8a267318e9 for a list of currently used parameter names, as of a week ago.

beberlei commented 4 years ago

Question: DOMDocument uses $filname a lot, the whole API is mostly camel case, so question would be $fileName or $filename?

Edit: Mark made a good suggestion to rename these in dom explicitly to $path

derickr commented 4 years ago

On 25 July 2020 20:55:50 BST, Benjamin Eberlei notifications@github.com wrote:

Question: DOMDocument uses $filname a lot, the whole API is mostly camel case, so question would be $fileName or $filename?

$filename

Making this $fileName, although arguably more "correct" is also obnoxious. Just like $tineZone would be.

derickr commented 4 years ago

I've given ext/date another pass. Should be done now.

In general, I'm not a fan of short names like $str over $string, and I am a fan of clarify over conciseness. Try also pronouncing $str.

bwoebi commented 4 years ago

@derickr For the specific instance of str vs string I was also taking inspiration of other languages, e.g. strcmp in C has str1 and str2. In mysql the string to be replaced is called str in replace function. In D the parameters are even called just S (e.g. split()). etc. I see an overwhelmingly lot of abbreviations for string in the wild, most commonly str. I'd argue we really are in line with other languages if we always use $str. I agree with you that unnecessary abbreviating should be avoided, but when a specific abbreviation has a lot of precedent, I see no gain in writing it out.

kocsismate commented 4 years ago

FWIW I like most of the suggestions in the OP, except for $str. :) I think parameter names are not something that needs to be in sync with other languages, so that we can set our own policy.

However, Nikita also expressed that he'd prefer abbreviations where it's possible, so I think we should come up with some heuristics when to consider to use them:

While $str definitely falls in the first category, it does not in the second (only 3 characters shorter). In the same time, e.g. $char, $arg, and probably $param (5, 5, 4 characters shorter) fall in both in my opinion.

Even if we agreed to use the above rule of thumbs, we'd be using abbreviations pretty much inconsistently, because parameter names would be mostly non-abbreviated, with some "random" exceptions. In order to fix this inconsistency, we could have another heuristics:

For example, it's not likely that someone writes This function returns a str, while it's more likely to see sentences like The first param is optional, The first arg passed to the function is an integer. This way, I think we could alleviate the inconsistency of mixing abbreviations with full names.

TysonAndre commented 4 years ago

https://gist.github.com/TysonAndre/8d71559febf58f6af41cce009d4fc704#file-gistfile1-txt-L15 has some notes on the differences between php-src and Phan, which uses function signatures from a variety of places

There's a collection of scripts I wrote in https://github.com/phan/phan/tree/master/internal (internalsignatures.php and lib/) which were originally designed for extracting union types, but can be adapted for extracting parameter names. They're not exactly designed for reuse, but it's still doable (extract all from stubs file, extract all from svn, and compare where they differ in naming).

This also found some bugs.

Name mismatch for imagesetclip: #3 is $y1 in Phan, $x2 in source
Name mismatch for imagesetclip: #4 is $x2 in Phan, $y1 in source
    zend_long x1, y1, x2, y2;

    if (zend_parse_parameters(ZEND_NUM_ARGS(), "Ollll", &im_zval, gd_image_ce, &x1, &y1, &x2, &y2) == FAILURE) {
        RETURN_THROWS();
    }

Crell commented 4 years ago

Oh, this is a fun thread. Related: https://github.com/php/php-src/pull/6027

In chat, Levi suggested standardizing array functions on "input" and "callback" (where it's currently a combination of array, array1, or various less sensible things). That seems sensible to me if it does to everyone else.

Crell commented 4 years ago

Another related PR: https://github.com/php/php-src/pull/6032

TysonAndre commented 4 years ago

$input seems reasonable - it'd hopefully make it easier to remember the names if it was used everywhere where there was one input array.

It also works better when there's also collection support, or for array_* functions that also accept non-traversable objects.



Maybe we should focus on adding positional-only parameters to php 8.0 to avoid this if we can't decide (I'm kidding) -- I doubt the release managers would give that an extension, and that addition may break extensions such as debuggers until they implemented support

Crell commented 4 years ago

For most of the array functions, they take only an array and don't modify it, so $input seems reasonable to me. (I also considered $source, but I like $input better.) For the moment I'm not dealing with iterables; I don't expect array_map() to ever start taking an iterable, honestly.

For mutating functions like sort() et al, $array is probably the best option. Anything else gets kind of abstract, like $subject. Absent any better ideas I'm good leaving those as is.

nikic commented 4 years ago

I prefer $array over $input. "Input" is very generic and could refer to literally any parameter.

TysonAndre commented 4 years ago

I prefer $array over $input. "Input" is very generic and could refer to literally any parameter.

After some consideration, I feel the same way

There's also the inconsistency of having generic object functions accept $object, some generic string functions accept $str/$string, feof accepting handle/resource, but array functions expecting $input.

It'd be easier to see wrongness visually with array_filter(array: $x->getTraversable(), callback: $cb) than array_filter(input: $x->getTraversable(), callback: $cb), or more obscure functions in other extensions

TysonAndre commented 4 years ago

I've put together a script to compare the stubs used by php-src 8.0 with the parameter names in php.net

https://gist.github.com/TysonAndre/1994cc681ad04109b7ab3f2fd7407b6f The generated output from https://github.com/phan/phan/commits/1c7bb0b5ea3fa46460806c33895a5a509af187b3 (excludes some files with repeated basenames such as calendar.stub.php, easy enough to rename)

There's around 800 mismatches. Some notable differences:


# Run with php8.0 internal/internalsignatures.php compare-named-parameters ../php-src-stubs ../phpdoc-en
# from https://github.com/phan/phan/commits/1c7bb0b5ea3fa46460806c33895a5a509af187b3

Parsed 5173 signatures from stubs and 9704 from documentation
Comparing signatures
Saw parameter name mismatch for ArrayIterator::asort
Reflection parameters: {"sort_flags=":"int"}
php.net documentation: []

Saw parameter name mismatch for ArrayIterator::ksort
Reflection parameters: {"sort_flags=":"int"}
php.net documentation: []

Saw parameter name mismatch for ArrayIterator::offsetSet
Reflection parameters: {"index":"mixed","value":"mixed"}
php.net documentation: {"index":"mixed","newval":"mixed"}

Saw parameter name mismatch for ArrayObject::asort
Reflection parameters: {"sort_flags=":"int"}
php.net documentation: []

....

Saw parameter name mismatch for curl_strerror
Reflection parameters: {"error_number":"int"}
php.net documentation: {"errornum":"int"}

Saw parameter name mismatch for curl_unescape
Reflection parameters: {"handle":"\\CurlHandle","string":"string"}
php.net documentation: {"ch":"resource","str":"string"}

Saw parameter name mismatch for current
Reflection parameters: {"arg":"array|object"}
php.net documentation: {"array":"array"}

....
Saw parameter name mismatch for usort
Reflection parameters: {"&array":"array","cmp_function":"callable"}
php.net documentation: {"&array":"array","value_compare_func":"callable"}

Saw parameter name mismatch for var_dump
Reflection parameters: {"value":"mixed","...values=":"list<mixed>"}
php.net documentation: {"expression":"mixed","...=":"mixed"}

Saw parameter name mismatch for var_export
Reflection parameters: {"value":"mixed","return=":"bool"}
php.net documentation: {"expression":"mixed","return=":"bool"}
...
Crell commented 4 years ago

New PR that covers all of array functions: https://github.com/php/php-src/pull/6046

Docs to be updated accordingly after that gets merged.

dtakken commented 4 years ago

I was just wondering what to do with extensions that expose a dual object oriented / procedural API. For these extensions the class methods have equivalent global functions.

We use different parameter naming styles for functions and methods. However, it seems like the dual APIs share the signatures between a method and its equivalent function. That means we presently cannot use different parameter naming styles for the two APIs without changing something about the method/function mapping mechanism.

Example: https://github.com/php/php-src/blob/b4c2670f857bd23c8b0922bd7f0f4a25d91f28d2/ext/xmlwriter/php_xmlwriter.stub.php

Solving this requires changes to the stub parser, right?

kocsismate commented 4 years ago

@dtakken I think there's no technical issue with using different parameter names in aliases. When I was making changes to this file today (I assume you saw it? :) ) I was close to rename the parameters in question to use the correct case, but I wanted to have some discussion first, so changed my mind.

dtakken commented 4 years ago

@kocsismate Ok then I do not understand how it works. :) I only see it generate one function signature in the *_arginfo.h file. I do see that an @alias comment can change the name of the resulting function, but not its parameters?

Yes, I saw your changes. Great job.

kocsismate commented 3 years ago

I only see it generate one function signature in the *_arginfo.h file.

The alias in the context of stubs only means that the same C code (PHP_FUNCTION(xmlwriter_write_dtd_entity)) is to be invoked for both the aliased function and the alias method. On the other hand, both of them have a separate arginfo structure, since the return type of these functions and the number of parameters are already different (https://github.com/php/php-src/blob/d9628b9ca961d634e81e5c4d4fcb218873ae9159/ext/xmlwriter/php_xmlwriter_arginfo.h#L158 and https://github.com/php/php-src/blob/d9628b9ca961d634e81e5c4d4fcb218873ae9159/ext/xmlwriter/php_xmlwriter_arginfo.h#L313).

dtakken commented 3 years ago

@kocsismate Ah yes, I see it now. I was looking into it to see how I might help a bit with this effort. Now I know. Thanks!

nikic commented 3 years ago

With PHP 8 released, I think it's safe to close this...