renatahodovan / grammarinator

ANTLR v4 grammar-based test generator
Other
340 stars 61 forks source link

Grammarinator crashes when generating sqlite test cases #214

Open ahmayun opened 4 months ago

ahmayun commented 4 months ago

I am trying to use grammarinator to generate test cases for sqlite. I am using the ANTLR grammar for sqlite that is available at the official antlr repo:

First I run: grammarinator-process examples/grammars/SQLiteLexer.g4 examples/grammars/SQLiteParser.g4 -o examples/fuzzer

Which works fine.

But when I run: grammarinator-generate SQLiteGenerator.SQLiteGenerator -r sql_stmt -d 20 -o examples/tests/test_%d.sql -n 100 -s SQLiteGenerator.html_space_serializer --sys-path examples/fuzzer/

I often get the following error (Note that it does not always crash but 9/10 times it will):

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/generate.py", line 78, in create_test
    return generator_tool.create(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/tool/generator.py", line 255, in create
    f.write(test)
  File "<frozen codecs>", line 727, in write
  File "<frozen codecs>", line 377, in write
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud85f' in position 878: surrogates not allowed
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ahmad/anaconda3/envs/grammarinator/bin/grammarinator-generate", line 8, in <module>
    sys.exit(execute())
             ^^^^^^^^^
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages/grammarinator/generate.py", line 158, in execute
    for _ in pool.imap_unordered(parallel_create_test, count(0) if args.n == inf else range(args.n)):
  File "/home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/multiprocessing/pool.py", line 873, in next
    raise value
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud85f' in position 878: surrogates not allowed

From my understanding, this means that grammarinator is generating values that can't be encoded as a utf-8 string. Is this an issue with grammarinator or is there a way to handle this that I am not aware of?

Here are some environment details if needed:

$ pip show grammarinator
      Name: grammarinator
      Version: 23.7.post76+gf3ffa71.d20240427
      Summary: Grammarinator: Grammar-based Random Test Generator
      Home-page: https://github.com/renatahodovan/grammarinator
      Author: Renata Hodovan, Akos Kiss
      Author-email: hodovan@inf.u-szeged.hu, akiss@inf.u-szeged.hu
      License: BSD
      Location: /home/ahmad/anaconda3/envs/grammarinator/lib/python3.12/site-packages
      Requires: antlerinator, antlr4-python3-runtime, autopep8, inators, jinja2, regex
      Required-by: 
$ python -V
      Python 3.12.3
renatahodovan commented 4 months ago

The problem is that the grammar enables to generate surrogates as part of some tokens, however the test generator is not prepared to encode them while saving the output to file. To configure the encoding and the error handlers of encoding, you can use the --encoding and the --encoding-errors CLI options of grammarinator-generate. These values will be passed to the encoding and errors parameter of codecs.open so you can set their values accordingly. In this case, I think the simples solution is to provide --encoding-errors=surrogatepass argument to grammarinator-generate.