nginx / unit

NGINX Unit - universal web app server - a lightweight and versatile open source server that simplifies the application stack by natively executing application code across eight different programming language runtimes.
https://unit.nginx.org
Apache License 2.0
5.27k stars 324 forks source link

Python: Fix an issue with HTTP header field values being incorrectly reported as non-ascii #997

Closed ac000 closed 8 months ago

ac000 commented 8 months ago

This PR fixes an issue in Python 3.x whereby certain HTTP header field values where being incorrectly reported as non-ascii by the Python .isascii() method.

This PR comprises two patches

1) Python: Do nxt_unit_sptr_get() earlier in nxt_python_field_value().

Does what it says on the tin. It's a preparatory patch that simply calls the nxt_unit_sptr_get() function earlier in nxt_python_field_value().

2) Python: Fix header field values character encoding.

The meat and potatoes. This replaces the use of PyUnicode_New() with PyUnicode_DecodeCharmap()

This leaves the old code path in place for Python < 3.0. Python 2.7 has been deprecated for a few years, so hopefully at some point in the not too distant future all the < Python 3.0 code can just get ripped out.

hongzhidao commented 8 months ago

Hi, here are a few thoughts.

  1. we can refactor the calculation logic for the f->value.

    src = nxt_unit_sptr_get(&f->value);
    p = nxt_cpymem(p, src, f->value_length);
    
    for (i = 1; i < n; i++) {
        p = nxt_cpymem(p, ", ", 2);
    
        src = nxt_unit_sptr_get(&f[i].value);
        p = nxt_cpymem(p, src, f[i].value_length);
    }
  2. Use nxt_unit_malloc/free instead of nxt_malloc/nxt_free, the difference is they use different log_debug/log_alert.
  3. We usually put a new blank line before or after an independent logic. But sometimes not if it's just one line. For example:

      for (i = 1; i < n; i++) {
            p = nxt_cpymem(p, ", ", 2);
    
            src = nxt_unit_sptr_get(&f[i].value);
            p = nxt_cpymem(p, src, f[i].value_length);
        }
    
        /* blank line */
        *p = '\0';
    
    if (nxt_slow_path(n > 1)) {
        nxt_free(src);
    }
    
    /* blank line */
    #else
    res = PyString_FromStringAndSize(NULL, vl);
  4. namings, it depends on their context.
    
    char  *ptr;
    =>
    char  *str;
    str = nxt_unit_malloc(NULL, vl + 1);

char src; => char value;

value = nxt_unit_sptr_get(&f->value); p = nxt_cpymem(p, value, f->value_length);


Others look good, based on the above, I'd suggest using one patch, Like the following.

diff -r 4fd775fdbd4c src/python/nxt_python_wsgi.c --- a/src/python/nxt_python_wsgi.c Thu Nov 02 17:07:34 2023 +0800 +++ b/src/python/nxt_python_wsgi.c Thu Nov 09 09:33:57 2023 +0800 @@ -856,15 +856,56 @@ }

+static char +nxt_python_get_value(nxt_unit_field_t f, char *dst, int n) +{

ac000 commented 8 months ago

I find it kind of ironic that you suggest this kind of recator!

As explained in the in the PR AND commit message. I didn't want to touch the Python 2.x code path as I have no easy way to test it it.

But I'l tell you what.

I'm going to close this PR and you can just do it yourself...

hongzhidao commented 8 months ago

Btw, I found you didn't assign a reviewer, how do we know when your patches like this need to be reviewed? I commented on it just because I'm afraid it's blocked on my side.

ac000 commented 8 months ago

As to your suggestion. NAK.

I explained why I didn't do what you suggest, but you didn't say why that explanation is invalid.

hongzhidao commented 8 months ago

I didn't want to touch the Python 2.x code path as I have no easy way to test it it.

Agree, but if you looked at the changes again, you could find out if the feedback makes sense.

if (nxt_slow_path(n > 1)) {
        char  *ptr;  /* (1) str might be better */

        p = nxt_malloc(vl + 1);
       /* (2) Use nxt_unit_malloc/free instead of nxt_malloc/nxt_free, the difference is they use different log_debug/log_alert. */
       /* (3) missed error handing */

        ptr = p;
        p = nxt_cpymem(p, src, f->value_length);

        for (i = 1; i < n; i++) {
            p = nxt_cpymem(p, ", ", 2);

            src = nxt_unit_sptr_get(&f[i].value);
            p = nxt_cpymem(p, src, f[i].value_length);
        }
        *p = '\0';

        src = ptr;
    }

    res = PyUnicode_DecodeCharmap(src, vl, NULL, NULL);

    if (nxt_slow_path(n > 1)) {
        nxt_free(src);
    }

    /* (4) new blank line */
#else

And I'm fine if developers insist on their code style, I suggested the reason why a blank line is good for some places.

I thought you understood the change with the different patches. It's related to the first commit, when I read the code, maybe including other developers, I need to understand why putting the first value at the top.

Before:

    src = nxt_unit_sptr_get(&f->value);
    p = nxt_cpymem(p, src, f->value_length);

    for (i = 1; i < n; i++) {
        p = nxt_cpymem(p, ", ", 2);

        src = nxt_unit_sptr_get(&f[i].value);
        p = nxt_cpymem(p, src, f[i].value_length);
    }

Change version 1.

    p = nxt_malloc(vl + 1);
    ptr = p;
    p = nxt_cpymem(p, src, f->value_length); /* src is set at the top, developers need to know the related information. */

    for (i = 1; i < n; i++) {
         p = nxt_cpymem(p, ", ", 2);

         src = nxt_unit_sptr_get(&f[i].value);
         p = nxt_cpymem(p, src, f[i].value_length);
    }
    *p = '\0';

    src = ptr;

I found it might not need to put them in different places. Change version 2.

    if (n > 1) {
        str = nxt_malloc(vl + 1);

        p = nxt_python_get_value(f, str, n);
        *p = '\0';

    } else {
        ptr = nxt_unit_sptr_get(&f->value);
    }

Most of the changes are related to the part about Python 3.x. They are fixings, refactorers, and code design. And I just shared my thoughts on it as a normal reviewer did.

ac000 commented 8 months ago

And I'm fine if developers insist on their code style, I suggested the reason why a blank line is good for some places.

I like a blank line as much as the next person, but we have way too many of them in Unit (and unless you're Gilfoyle, one is not blessed with vertical space!...)

t's related to the first commit, when I read the code, maybe including other developers, I need to understand why putting the first value at the top.

To quote the commit message (do you even read these things?)

    Python: Do nxt_unit_sptr_get() earlier in nxt_python_field_value().

    This is a preparatory patch for fixing an issue with the encoding of
    http header field values.

    This patch simply moves the nxt_unit_sptr_get() to the top of the
    function where we will need it in the next commit.

Move it to the top of the function because that's where we'll need it in the next commit and you can clearly see from the next commit that where we call it from, it wouldn't work if it was left where it was...

hongzhidao commented 8 months ago

do you even read these things?

Yes, I did.

I said the reviewer is to read the code carefully and provide suggestions. I shared the reason for the different way, including the style, and it shows another perspective for discussion.

I said "here are a few thoughts.". I didn't ask you have to choose it, right? But why did you use words like ironic, Gilfoyle again?

Btw, not sure I understand incorrety, they seems bad words. If you are intentionally offensive, please don't ask me to review your code.

ac000 commented 8 months ago

But why did you use words like ironic, Gilfoyle again?

Again? First use I'm pretty sure. Just a humorous reference to the Silicon Valley sitcom. But in case you're not familiar. Gilfoyle is notable for running some of his monitors vertically, ideal for coding!

ac000 commented 8 months ago

Use nxt_unit_malloc()/nxt_unit_free() wrappers.

Range diff

1:  e0da58c2 ! 1:  05b4b13d Python: Fix header field values character encoding.
    @@ src/python/nxt_python_wsgi.c: nxt_python_field_value(nxt_unit_field_t *f, int n,
     +    if (nxt_slow_path(n > 1)) {
     +        char  *ptr;
     +
    -+        p = nxt_malloc(vl + 1);
    ++        p = nxt_unit_malloc(NULL, vl + 1);
     +        ptr = p;
     +        p = nxt_cpymem(p, src, f->value_length);
     +
    @@ src/python/nxt_python_wsgi.c: nxt_python_field_value(nxt_unit_field_t *f, int n,
     +    res = PyUnicode_DecodeCharmap(src, vl, NULL, NULL);
     +
     +    if (nxt_slow_path(n > 1)) {
    -+        nxt_free(src);
    ++        nxt_unit_free(NULL, src);
     +    }
      #else
          res = PyString_FromStringAndSize(NULL, vl);

I'm not inclined to make further changes...

hongzhidao commented 8 months ago

I'm not inclined to make further changes...

That's ok, and missed error checking.

p = nxt_unit_malloc(NULL, vl + 1);
if (nxt_slow_path(p == NULL)) {
    return NULL;
}

But why did you use words like ironic, Gilfoyle again? Again? First use I'm pretty sure.

Just be affected by the action of PR closing. No harm feelings, let's stay respectful and solve issues as a team. :)

ac000 commented 8 months ago

That's ok, and missed error checking.

p = nxt_unit_malloc(NULL, vl + 1);
if (nxt_slow_path(p == NULL)) {
    return NULL;
}

Good catch!

Just be affected by the action of PR closing. No harm feelings, let's stay respectful and solve issues as a team. :)

Understood.

ac000 commented 8 months ago

Add missed error check for the malloc.

Range diff

1:  05b4b13d ! 1:  5cfad9cc Python: Fix header field values character encoding.
    @@ src/python/nxt_python_wsgi.c: nxt_python_field_value(nxt_unit_field_t *f, int n,
     +        char  *ptr;
     +
     +        p = nxt_unit_malloc(NULL, vl + 1);
    ++        if (nxt_slow_path(p == NULL)) {
    ++            return NULL;
    ++        }
    ++
     +        ptr = p;
     +        p = nxt_cpymem(p, src, f->value_length);
     +
lexborisov commented 8 months ago

@ac000

Why are you being so condescending? It's very unpleasant to read something like that. It feels like you're creating a toxic atmosphere.

F5 management should pay attention to this.

ac000 commented 8 months ago

@hongzhidao

I went back and looked at the main commit message thinking maybe I didn't explain things clearly enough.

However I, think, it's clear. Specifically this bit

I did not want to touch the Python 2.7 code (which may or may not even be affected by this) so kept these changes completely isolated from that, hence a slight duplication with the for () loop.

Python 2.7 was sunset on January 1st 2020[0], so this code will hopefully just disappear soon anyway.

If it's not, I will change it as having good commit messages is vitally important.

callahad commented 8 months ago

@lexborisov Yes, this interaction between colleagues fell short of the standards we hold ourselves to. You can expect us to do better in the future.