gristlabs / asttokens

Annotate Python AST trees with source text and token information
Apache License 2.0
172 stars 34 forks source link

IndexError in asttokens.ASTTokens #105

Open DavidKorczynski opened 1 year ago

DavidKorczynski commented 1 year ago

The following program raises an uncaught exception:

import sys
import asttokens, ast
import atheris

def TestOneInput(data):
  fdp = atheris.FuzzedDataProvider(data)
  source_to_parse = fdp.ConsumeUnicodeNoSurrogates(4196)
  try:
    ast.parse(source_to_parse)
  except:
    # Avoid anything that throws any issues in ast.parse.
    return
  try:
    atok = asttokens.ASTTokens(source_to_parse, parse=True)
  except SyntaxError:
    pass

data = (b"\x79\x0a\x79\x0a\x79\x0d\x79\x0a\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x79\x0a\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x0a\x79\xae\x79\x0a\x78\x0a\x79\x0a\x79\x0a\x79\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\xc5\x0a")
TestOneInput(data)

Where the atheris module refers to https://pypi.org/project/atheris/

The program is a derivative of the fuzzer here https://github.com/google/oss-fuzz/blob/master/projects/asttokens/fuzz_asttokens.py

The following program is a shortened version of above, without fuzzing-related logic:

import asttokens, ast

def TestOneInput():
  source_to_parse = "\x0a\x79\x0a\x79\x0d\x79\x0a\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x79\x0a\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x0a\x79\x2e\x79\x0a\x78\x0a\x79\x0a\x79\x0a\x79\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x45\x0a"

  try:
    ast.parse(source_to_parse)
  except:
    # Avoid anything that throws any issues in ast.parse.
    return
  try:
    atok = asttokens.ASTTokens(source_to_parse, parse=True)
  except SyntaxError:
    pass

TestOneInput()

This produces the stack trace:

# python3 ./reproducer.py 
Traceback (most recent call last):
  File "./reproducer.py", line 29, in <module>
    TestOneInput()
  File "./reproducer.py", line 26, in TestOneInput
    atok = asttokens.ASTTokens(source_to_parse, parse=True)
  File "/usr/local/lib/python3.8/site-packages/asttokens/asttokens.py", line 127, in __init__
    self.mark_tokens(self._tree)
  File "/usr/local/lib/python3.8/site-packages/asttokens/asttokens.py", line 139, in mark_tokens
    MarkTokens(self).visit_tree(root_node)
  File "/usr/local/lib/python3.8/site-packages/asttokens/mark_tokens.py", line 61, in visit_tree
    util.visit_tree(node, self._visit_before_children, self._visit_after_children)
  File "/usr/local/lib/python3.8/site-packages/asttokens/util.py", line 273, in visit_tree
    ret = postvisit(current, par_value, cast(Optional[Token], value))
  File "/usr/local/lib/python3.8/site-packages/asttokens/mark_tokens.py", line 109, in _visit_after_children
    nfirst, nlast = self._methods.get(self, node.__class__)(node, first, last)
  File "/usr/local/lib/python3.8/site-packages/asttokens/mark_tokens.py", line 220, in handle_attr
    name = self._code.next_token(dot)
  File "/usr/local/lib/python3.8/site-packages/asttokens/asttokens.py", line 210, in next_token
    while is_non_coding_token(self._tokens[i].type):
IndexError: list index out of range

This was found by way of OSS-Fuzz and the set up here: https://github.com/google/oss-fuzz/tree/master/projects/asttokens If you find this issue helpful then it would be great to have maintainer emails in the project.yaml to receive notifications of bug reports, which contain all details similar to what I posted above -- namely they contain the stacktrace, crashing input and identification of the fuzzer.

PeterJCLaw commented 1 year ago

Here's a more minimal cut-down which appears to fail in the same way: '\ry.y\n'. Attempting to cut down further (either by removing the attribute access, leaving just y, or changing the leading carriage return to a newline) causes the error to disappear. The mix of line ending styles here seems to be part of the issue, though why the attribute access is needed is less clear.

alexmojaki commented 1 year ago

Thanks, I was gonna say something similar. '\ry' also produces an error, but a different one. Probably the same underlying cause.