python / cpython

The Python programming language
https://www.python.org
Other
63.08k stars 30.21k forks source link

C API: Support "nullable" parameter types in PyArg_Parse* #112068

Open serhiy-storchaka opened 11 months ago

serhiy-storchaka commented 11 months ago

Feature or enhancement

Some builtin functions accept None as a special value in addition to values that should have the specified type. For example start and stop indices in index(), size in read(), dir_fd in many os functions, and numerous string parameters, e.g. encoding and errors.

There are special format units in PyArg_Parse* functions for "string or None": z, z*, z#. There is half-public converter _PyEval_SliceIndex for indices, and several more private converters. But while NULL can non-ambiguously represent None in C, there is no single integer value for None. In many cases it is -1, but it is 0 and Py_SIZE(self) for indices, AT_FDCWD for dir_fd parameter, etc. This is why _PyEval_SliceIndex keeps the value of the C variable unchanged when the argument is None.

I propose to generalize it to all types. Add the ? modifier. If it occurs after the format unit, and the argument is None, it keeps the initial value of the C variable, like if the argument was optional and missed. For example:

    int dir_fd = AT_FDCWD;
    Py_ssize_t size = PY_SSIZE_T_MAX;
    double speed = 0.0;
    int sep = EOF;
    int flag = -1;
    if (!PyArg_ParseTuple(args, "i?n?d?C?p?", &dir_fd, &size, &speed, &sep, &flag)) {
        return NULL;
    }

The code accepts two integers, a float, a character, and an arbitrary object (as boolean), but any of 5 arguments can also be None, in which case the corresponding variable keeps its initial value.

There was a similar proposition for Argument Clinic: #64540. But this proposition works not only with Argument Clinic, therefore it can be used in third-party code. It works not only with integers, but with all types, including custom converters. It is more convenient, because do not need to use a special structure. I believe that it covers most of use cases, and in few remaining cases you still can handle it manually, as before.

After implementing this for PyArg_Parse* I will add support in Argument Clinic.

Linked PRs

serhiy-storchaka commented 3 months ago

It turned out, that it is much more simpler to implement this as a prefix operator than a suffix. It can even be more efficient in future. So the above example looks:

    int dir_fd = AT_FDCWD;
    Py_ssize_t size = PY_SSIZE_T_MAX;
    double speed = 0.0;
    int sep = EOF;
    int flag = -1;
    if (!PyArg_ParseTuple(args, "?i?n?d?C?p", &dir_fd, &size, &speed, &sep, &flag)) {
        return NULL;
    }
picnixz commented 3 months ago

I feel suffixes are easier to understand and many languages use the ? as a suffix, e.g., for the optional chaining operator: a?.b?.c The way I understand it is a? followed by . on a? so ? is more of a suffix. So personally, I prefer suffixes though prefixes are fine. Is there any plan for implementing it (or do you have perhaps a PoC?)

serhiy-storchaka commented 3 months ago

I created a PoC a long time ago, but had doubts about prefix vs suffix. Prefix only requires adding several lines of code in one place: https://github.com/python/cpython/pull/121187/files#diff-3a8078954be00d3c876e02dd0e5057aad07f2c5f82f6c3c76154dcf2866929cfR544-R557

    if (*format == '?') {
        format++;
        if (arg == Py_None) {
            msg = skipitem(&format, p_va, flags);
            if (msg == NULL) {
                *p_format = format;
            }
            else {
                levels[0] = 0;
            }
            return msg;
        }
        flags |= FLAG_NULLABLE;
    }

and simple changes in 30+ other places are only needed to make the error messages better.

Suffix requires to make additional complex changes in 20-40 places.

serhiy-storchaka commented 3 months ago

121303 is an alternative implementation of ? as a suffix. It is more complex, and (...)? is not implemented yet, because of complexity and runtime cost of implementation.

erlend-aasland commented 2 months ago

There was a similar proposition for Argument Clinic: https://github.com/python/cpython/issues/64540.

The discussion in the linked issue showed that this feature was highly controversial. What changed that made this non-controversial?

serhiy-storchaka commented 2 months ago

To me, it is two things:

The rest of the discussion looks to me as a pure bikeshedding -- whether it should be called "nullable" or "Noneable" or whatever.