we-like-parsers / pegen_experiments

Experiments for the official PEG parser generator for Python
https://github.com/python/cpython/tree/master/Tools/peg_generator
Other
273 stars 30 forks source link

Problem parsing string in file with latin-1 encoding #209

Closed ethanhs closed 4 years ago

ethanhs commented 4 years ago

Minimal example:

# -*- coding: latin_1 -*-
french = "tu pense qu'on peut m'utiliser comme ça?"

The string constant is "tu pense qu'on peut m'utiliser comme ça?" with pegen.

lysnikolaou commented 4 years ago

@pablogsal @gvanrossum Wouldn't this be solved by itself, once we integrate pegen into CPython?

pablogsal commented 4 years ago

Wouldn't this be solved by itself, once we integrate pegen into CPython?

Sort of, I think the problem is that this function call:

https://github.com/gvanrossum/pegen/blob/master/pegen/pegen.c#L607

is not taking the encoding at the moment. We can start hooking a call to PyTokenizer_FindEncodingFilename there but that will likely change when integrating this into CPython so we could wait until then.

gvanrossum commented 4 years ago

I'm sure this is fixed in the CPython version, and the version here is too outdated to care.