qqwqqw689 commented 1 week ago

Bug report

Bug description:

In file lexer.c we have

    Py_ssize_t invalid = _PyUnicode_ScanIdentifier(s);
    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

link

But for function _PyUnicode_ScanIdentifier in unicodeobject.c file

Py_ssize_t
_PyUnicode_ScanIdentifier(PyObject *self)
{
    Py_ssize_t i;
    Py_ssize_t len = PyUnicode_GET_LENGTH(self);
    if (len == 0) {
        /* an empty string is not a valid identifier */
        return 0;
    }

    int kind = PyUnicode_KIND(self);
    const void *data = PyUnicode_DATA(self);
    Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
    /* PEP 3131 says that the first character must be in
       XID_Start and subsequent characters in XID_Continue,
       and for the ASCII range, the 2.x rules apply (i.e
       start with letters and underscore, continue with
       letters, digits, underscore). However, given the current
       definition of XID_Start and XID_Continue, it is sufficient
       to check just for these, except that _ must be allowed
       as starting an identifier.  */
    if (!_PyUnicode_IsXidStart(ch) && ch != 0x5F /* LOW LINE */) {
        return 0;
    }

    for (i = 1; i < len; i++) {
        ch = PyUnicode_READ(kind, data, i);
        if (!_PyUnicode_IsXidContinue(ch)) {
            return i;
        }
    }
    return i;
}

link This function will never return a invalid valiable <0, so

    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

will never be executed.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux, macOS, Windows, Other

Linked PRs

gh-126473

rruuaanng commented 1 week ago

There seems to be no problem. Are you willing to submit PR to fix it? I don't have time now (if you don't want to repair it, you can leave it to me, maybe it will be a little late.

qqwqqw689 commented 1 week ago

@rruuaanng yes , I can.

skirpichev commented 1 week ago

I'm not sure it's a bug. This code was introduced in 74ea6b5a75. Obviously, at that time the check was required. The PyUnicode_READY() test was removed in f9c9354a7a. So, it seems now _PyUnicode_ScanIdentifier() - can't fail.

We can either (1) remove error check or (2) just keep that as-is, in case it might be required in future. I think we can go with (1).

CC @methane as author of f9c9354a7a

ZeroIntensity commented 1 week ago

IMO, there's no need to remove that. We might want to make that raise an error in the future, and most compilers should (hopefully) optimize that away if it can't ever happen anyway.

qqwqqw689 commented 1 week ago

@ZeroIntensity So make it a comment or something else like __builtin_expect?

ZeroIntensity commented 1 week ago

__builtin_expect could work, yeah.

python / cpython

Unnecessary error-checking branch in `lexer.c`. #126469

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs