python / cpython

The Python programming language
https://www.python.org
Other
63.76k stars 30.54k forks source link

Memory corruption with urllib.parse #77367

Closed c13e5c9f-0355-4f98-a46f-867649181c87 closed 6 years ago

c13e5c9f-0355-4f98-a46f-867649181c87 commented 6 years ago
BPO 33186
Nosy @stevendaprano, @imz
Files
  • testcase.py: testcase for the issue
  • data.txt.gz: data.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-security', 'invalid', 'library'] title = 'Memory corruption with urllib.parse' updated_at = user = 'https://bugs.python.org/alexeyaltlinuxorg' ``` bugs.python.org fields: ```python activity = actor = 'imz' assignee = 'none' closed = True closed_date = closer = 'steven.daprano' components = ['Library (Lib)'] creation = creator = 'alexey@altlinux.org' dependencies = [] files = ['47506', '47507'] hgrepos = [] issue_num = 33186 keywords = [] message_count = 7.0 messages = ['314688', '314692', '314695', '314707', '314711', '314727', '314736'] nosy_count = 3.0 nosy_names = ['steven.daprano', 'imz', 'alexey@altlinux.org'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'security' url = 'https://bugs.python.org/issue33186' versions = ['Python 3.5'] ```

    c13e5c9f-0355-4f98-a46f-867649181c87 commented 6 years ago

    There is a strange behavior while processing data in a "for" loop with urllib.parse.unquote() - looks like memory corruption - a list contains elements that have never been appended.

    I'll explain the testcase.

    I spotted the problem by checking for any remains of url encoding left in output_list. There were these strings with url encoding left - "bad_boys" dict in testcase. Now, when iterating through input_list (read from "data.txt"), I'm checking for those problematic entries and printing what is being appended to the output_list as well as all problematic (unwanted, "Bad Boys") and converted problematic entries ("Normal conversions") existing in the output_list. At some point, unwanted entries appear in output_list. The resulting output_list contains converted and unconverted problematic entries, though input_list's length equals output_list's length.

    data.txt needs to be saved along with testcase.py, and then you can run testcase.py.

    The output of running testcase.py:

    Bad Boys are: seil%2fturbo_firmware 140335684191552 intelligent_platforms_proficy_hmi%2fscada_cimplicity 140335684515920 seil%2fneu_2fe_plus_firmware 140335684536080 seil%2fb1_firmware 140335684134640 eil%2fx2_firmware 140335684191984 seil%2fx1_firmware 140335684190832 seil%2fx2_firmware 140335684190904 seil%2fx86_firmware 140335684192488

    Input list length is: 17094

    Bad Boy detected Element type: \<class 'str'> Convertation: seil%2fb1_firmware 140335679096848 >> seil/b1_firmware 140335681345768 Just appended: seil/b1_firmware 140335681345768 Normal conversions in output list: seil/b1_firmware 140335681345768

    Bad Boy detected Element type: \<class 'str'> Convertation: seil%2fx1_firmware 140335679096920 >> seil/x1_firmware 140335681345840 Just appended: seil/x1_firmware 140335681345840 Normal conversions in output list: seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840

    Bad Boy detected Element type: \<class 'str'> Convertation: seil%2fx2_firmware 140335679096992 >> seil/x2_firmware 140335681345912 Just appended: seil/x2_firmware 140335681345912 Normal conversions in output list: seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840 seil/x2_firmware 140335681345912

    Bad Boy detected Element type: \<class 'str'> Convertation: seil%2fx86_firmware 140335679134936 >> seil/x86_firmware 140335681346704 Just appended: seil/x86_firmware 140335681346704 Normal conversions in output list: seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840 seil/x2_firmware 140335681345912 seil/x86_firmware 140335681346704 Bad Boys in output list: eil%2fx2_firmware 140335681346272

    Bad Boy detected Element type: \<class 'str'> Convertation: seil%2fturbo_firmware 140335679200976 >> seil/turbo_firmware 140335679269456 Just appended: seil/turbo_firmware 140335679269456 Normal conversions in output list: seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840 seil/x2_firmware 140335681345912 seil/x86_firmware 140335681346704 seil/turbo_firmware 140335679269456 Bad Boys in output list: eil%2fx2_firmware 140335681346272 seil%2fb1_firmware 140335679267800 seil%2fx1_firmware 140335679267872 seil%2fx2_firmware 140335679267944 seil%2fx86_firmware 140335679269384

    Bad Boy detected Element type: \<class 'str'> Convertation: seil%2fneu_2fe_plus_firmware 140335678867056 >> seil/neu_2fe_plus_firmware 140335680328928 Just appended: seil/neu_2fe_plus_firmware 140335680328928 Normal conversions in output list: seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840 seil/x2_firmware 140335681345912 seil/x86_firmware 140335681346704 seil/turbo_firmware 140335679269456 seil/neu_2fe_plus_firmware 140335680328928 Bad Boys in output list: eil%2fx2_firmware 140335681346272 seil%2fb1_firmware 140335679267800 seil%2fx1_firmware 140335679267872 seil%2fx2_firmware 140335679267944 seil%2fx86_firmware 140335679269384 seil%2fturbo_firmware 140335679849576

    Bad Boy detected Element type: \<class 'str'> Convertation: intelligent_platforms_proficy_hmi%2fscada_cimplicity 140335678546800 >> intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376 Just appended: intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376 Normal conversions in output list: seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840 seil/x2_firmware 140335681345912 seil/x86_firmware 140335681346704 seil/turbo_firmware 140335679269456 seil/neu_2fe_plus_firmware 140335680328928 intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376 Bad Boys in output list: eil%2fx2_firmware 140335681346272 seil%2fb1_firmware 140335679267800 seil%2fx1_firmware 140335679267872 seil%2fx2_firmware 140335679267944 seil%2fx86_firmware 140335679269384 seil%2fturbo_firmware 140335679849576 seil%2fneu_2fe_plus_firmware 140335678934512

    FINAL RESULTS Output list length is: 17094 Normal conversions in output list (Bad Boys -related): seil/b1_firmware 140335681345768 seil/x1_firmware 140335681345840 seil/x2_firmware 140335681345912 seil/x86_firmware 140335681346704 seil/turbo_firmware 140335679269456 seil/neu_2fe_plus_firmware 140335680328928 intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376 Bad Boys in output list: eil%2fx2_firmware 140335681346272 seil%2fb1_firmware 140335679267800 seil%2fx1_firmware 140335679267872 seil%2fx2_firmware 140335679267944 seil%2fx86_firmware 140335679269384 seil%2fturbo_firmware 140335679849576 seil%2fneu_2fe_plus_firmware 140335678934512 intelligent_platforms_proficy_hmi%2fscada_cimplicity 140335681195728

    f36b827c-19af-4da3-8c3b-cb5c939b5097 commented 6 years ago

    I've tested this on a e2k machine (e2k is a VLIW/EPIC architecture, similar to ia64), where python3 3.5 has been compiled with a completely different proprietary compiler, and got the same bad result.

    python3-3.5.1-alt7.3

    (This has also been tested with the same results on an Ubuntu machine.)

    So, this probably means that this is not a matter of wrong compiler optimizations.

    f36b827c-19af-4da3-8c3b-cb5c939b5097 commented 6 years ago

    The bad results are also reproducible with:

    # dpkg -l python3.2
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
    |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
    ||/ Name                                                          Version                             Architecture                        Description
    +++-=============================================================-===================================-===================================-

    =============================================================================================================================== ii python3.2 3.2.3-7 ia64 Interactive high-level object-oriented language (version 3.2)

    f36b827c-19af-4da3-8c3b-cb5c939b5097 commented 6 years ago

    a list contains elements that have never been appended

    Why do we think that it's true?

    1. The described result can be valid if there are "doubly quoted" strings in the input list.

    2. Also, there could be a bug in unquote() which doesn't corrupt the list as a side effect, but simply returns a wrong result value, which is indeed appended to the output list at its turn.

    f36b827c-19af-4da3-8c3b-cb5c939b5097 commented 6 years ago
    1. The described result can be valid if there are "doubly quoted" strings in the input list.

    To check this I've modified your testcase.py:

    diff --git a/testcase.py b/testcase.py
    index b597205..fefcef2 100755
    --- a/testcase.py
    +++ b/testcase.py
    @@ -53,6 +53,11 @@ for entry in input_list:
        # Conveting
        conv_entry=urllib.parse.unquote(entry)
    
    +   if conv_entry in bad_boys:
    +       flag=True
    +       print("ABOUT TO APPEND A BAD BOY")
    +       print("Conversion:",entry,id(entry),end=" ")
    +
        # Printing (debug)
        if flag:
            print(">>",conv_entry,id(conv_entry))
    @@ -61,7 +66,7 @@ for entry in input_list:
        output_list.append(conv_entry)
    
        # Printing (debug)
    -   if entry in bad_boys:
    +   if flag:
            print("Just appended:", output_list[-1], id(output_list[-1]))
            flag=False  
            for x in output_list:

    Let's run it and look for my new message:

    $ ./testcase.py | fgrep -A1 -e ABOUT
    ABOUT TO APPEND A BAD BOY
    Conversion: eil%252fx2_firmware 139920242273280 >> eil%2fx2_firmware 139920244465360
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: seil%252fb1_firmware 139920242325448 >> seil%2fb1_firmware 139920241899608
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: seil%252fx1_firmware 139920242325520 >> seil%2fx1_firmware 139920241899752
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: seil%252fx2_firmware 139920242325592 >> seil%2fx2_firmware 139920241899896
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: seil%252fx86_firmware 139920242356704 >> seil%2fx86_firmware 139920241901408
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: seil%252fturbo_firmware 139920242416280 >> seil%2fturbo_firmware 139920242993744
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: seil%252fneu_2fe_plus_firmware 139920242065856 >> seil%2fneu_2fe_plus_firmware 139920243007536
    --
    ABOUT TO APPEND A BAD BOY
    Conversion: intelligent_platforms_proficy_hmi%252fscada_cimplicity 139920241710560 >> intelligent_platforms_proficy_hmi%2fscada_cimplicity 139920244342480
    $ fgrep -ne 'seil%252fneu_2fe_plus_firmware' data.txt
    15191:seil%252fneu_2fe_plus_firmware
    $ 

    So, there is indeed a "doubly" quoted data, at 15191 in data.txt.

    This is not a bug, everything work correctly.

    This issue needs to be closed.

    stevendaprano commented 6 years ago

    Alexey, I'm afraid I can't make heads or tails of your bug report. You've gone into a detailed description of the "bad results" you have, but you haven't explained *why* they're bad, or what you were expecting.

    I'm afraid that I found understanding your test case code a hard slog. A much simpler test case showing input, expected output, and actual output is appreciated. It might help if you read this:

    http://sscce.org/

    Its written for Java programmers, but the same applies to any programming language.

    Please don't go straight to assumptions about the cause of the bug ("memory corruption") until you can demonstrate that there actually is a bug.

    Ivan has spent significant effort to understand your test case, thank you Ivan, and concludes:

    This is not a bug, everything work correctly.

    so I'm going to close the bug report. If you disagree with Ivan's conclusion, please write back with reasons why you disagree.

    It would also significantly help if you can provide a simpler test case that demonstrates expected results and actual results (and preferably input data small enough to fit inline in the testcase script, if possible).

    f36b827c-19af-4da3-8c3b-cb5c939b5097 commented 6 years ago

    Actually, Alexey shared this problem with me orally first, and then asked to have a look at his report, and I felt that just describing the technical details about what is going on is not enough, and suggested to include a brief sentence which would state what kind of assumptions seemed to be broken to Alexey. And it was:

    a list contains elements that have never been appended.

    And this statement actually appeared quite helpful for me: in a few hours after this statement was made, I got an idea that this statement might actually be not true in this case. And this quickly lead me to the demonstration what really happens here, and that it was correct Python behavior.

    Without this brief formal statement of the broken assumptions, I had simply had no ideas (because I supposed that something was going wrong according to Alexey's words, and didn't question whether it is really wrong or we might expect these results under some conditions as correct results).

    (Actually, I'm a fan of type-checking, and what is more, of languages with expressive types like Agda or Idris where the programmer is to make provable statements of the expected properties written down as types; sometimes, they can be derived automatically, but nevertheless my point is the attitude to programming, even without formal tools: think how the expected properties would be proved, how they would be decomposed into simpler statements during the proof. Well, this is offtopic here, so, Alexey, let's not continue this discussion here.)