liftoff / pyminifier

Pyminifier is a Python code minifier, obfuscator, and compressor.
GNU General Public License v3.0
1.45k stars 223 forks source link

Error when first line is a multi-line string #79

Open mlococo opened 7 years ago

mlococo commented 7 years ago

Summary

When running pyminifier from master (the bug is not present in 2.1 from PyPi), if the first line is a multi-line string, the following stacktrace is emitted:

Traceback (most recent call last):
  File "venv/bin/pyminifier", line 9, in <module>
    load_entry_point('pyminifier==2.1', 'console_scripts', 'pyminifier')()
  File "/home/mlococo/code/ripcord2/meld/venv/local/lib/python2.7/site-packages/pyminifier/__main__.py", line 168, in main
    pyminify(options, pyz_file)
  File "/home/mlococo/code/ripcord2/meld/venv/local/lib/python2.7/site-packages/pyminifier/__init__.py", line 253, in pyminify
    source = minification.minify(tokens, options)
  File "/home/mlococo/code/ripcord2/meld/venv/local/lib/python2.7/site-packages/pyminifier/minification.py", line 417, in minify
    result = reduce_operators(result)
  File "/home/mlococo/code/ripcord2/meld/venv/local/lib/python2.7/site-packages/pyminifier/minification.py", line 204, in reduce_operators
    if prev_tok[0] == tokenize.STRING:
TypeError: 'NoneType' object has no attribute '__getitem__'

Steps to reproduce

echo '"""multi-line\nstring"""' > test.py
pyminifier test.py

Additional Context

This commit https://github.com/liftoff/pyminifier/commit/2edb6360a9bf4dabced436da794d10a7c01e827d added some multi-line string handling, but didn't account for the case where the first line in the file is a multi-line string and prev_tok hasn't yet been set.

The reduce_operators function works correctly if we precede the prev_tok[foo] calls with a check verifying that prev_tok is truthy:

        if token_type != tokenize.OP:
            if start_col > last_col and token_type not in nl_types:
   -             if prev_tok[0] != tokenize.OP:
   +            if prev_tok and prev_tok[0] != tokenize.OP:
                    out += (" " * (start_col - last_col))
            if token_type == tokenize.STRING:
   -             if prev_tok[0] == tokenize.STRING:
   +            if prev_tok and prev_tok[0] == tokenize.STRING:
                    # Join the strings into one
                    string_type = token_string[0] # '' or ""
                    prev_string_type = prev_tok[1][0]
                    out = out.rstrip(" ") # Remove any spaces we inserted prev
                    if not joining_strings:
                        # Remove prev token and start the new combined string
                        out = out[:(len(out)-len(prev_tok[1]))]
                        prev_string = prev_tok[1].strip(prev_string_type)
                        new_string = (
                            prev_string + token_string.strip(string_type))
                        joining_strings = True
                    else:
                        new_string += token_string.strip(string_type)
        else:
            if token_string in ('}', ')', ']'):
   -             if prev_tok[1] == ',':
   +            if prev_tok and prev_tok[1] == ',':
                    out = out.rstrip(',')
patrickdundas commented 6 years ago

I was able to reproduce this. Thanks for posting the issue, I was able to quickly solve my similar problem by removing the first line.