Open DSLituiev opened 7 years ago
Interesting idea. Thinking about the tradeoffs, there are two downsides I can see:
(1) Backticks look very similar to single-quotes. In Python in general the BDFL has pronounced that backticks won't be assigned any meaning, because of this usability problem ("syntax shouldn't look like grit on Tim's monitor"). I guess this is also somewhat of an advantage for us b/c it means that they won't be assigned any other meaning.
(2) Patsy currently relies on Python's tokenizer. Because Python doesn't use backticks as a quoting marker, the Python tokenizer crashes if fed backticks:
In [6]: list(patsy.tokens.python_tokenize("foo + `baz`"))
---------------------------------------------------------------------------
PatsyError Traceback (most recent call last)
<ipython-input-6-024d1474cf98> in <module>()
----> 1 list(patsy.tokens.python_tokenize("foo + `baz`"))
/home/njs/.user-python3.5-64bit/lib/python3.5/site-packages/patsy/tokens.py in python_tokenize(code)
37 raise PatsyError("error tokenizing input "
38 "(maybe an unclosed string?)",
---> 39 origin)
40 if pytype == tokenize.COMMENT:
41 raise PatsyError("comments are not allowed", origin)
PatsyError: error tokenizing input (maybe an unclosed string?)
foo + `baz`
^
So the only way to implement this would be to fork our own copy of the tokenizer, and then make sure to keep it up to date with each Python release. (Actually, we would need multiple forks - at least one for python 2 and one for python 3, maybe more.) Unfortunately I don't see any way to really make this viable :-(
How about putting a thin layer on it:
def replace_backticks(x):
if "`" not in x:
return x
pttrn = re.compile("`([^`]*)`")
def repl(m):
return "Q('" + m.group(1) + "')"
return pttrn.sub(repl, x)
testlist = ["a ~ `50%`",
"t + `x/2` = `y` + `z`",
"`x%z` ~ `a.z`",
"a` ~ 12",
"y~x-1"]
for x in testlist:
result = replace_backticks(x)
print("="*20)
print(x)
print(result)
Returns:
====================
a ~ `50%`
a ~ Q('50%')
====================
t + `x/2` = `y` + `z`
t + Q('x/2') = Q('y') + Q('z')
====================
`x%z` ~ `a.z`
Q('x%z') ~ Q('a.z')
====================
a` ~ 12
a` ~ 12
====================
y~x-1
y~x-1
Note that the example 4 is broken
Other broken cases include things like the odd but currently valid)
Q("foo`bar")
I guess this isn't tooo bad because backticks are very rarely used, but... I dunno. I really like the thing where we use a real parser with fully-defined behavior.
I guess the other option would be some sort of fancy error-recovery support, where if lexing crashes we detect this case (the first unparsed character is backtick) and recover. Sounds messy but potentially doable...
Here is handling of back ticks within Q('')
import re
def _check_backticks_within_Q_(x):
pttrn = re.compile("(Q\([\'\"]).*`.*([\'\"]\))")
res = pttrn.finditer(x)
try:
next(res)
return True
except StopIteration:
return False
def _replace_backticks_(m):
return "Q('" + m.group(1) + "')"
def replace_backticks(x):
if "`" not in x:
return x
elif _check_backticks_within_Q_(x):
return x
pttrn = re.compile("`([^`]*)`")
return pttrn.sub(_replace_backticks_, x)
testlist = ["a ~ `50%`",
"t + `x/2` = `y` + `z`",
"`x%z` ~ `x!#%^`",
"y~x-1",
"y ~ Q('x`')",
"y ~ Q('`x`')",
"w ~ Q( ' x`!#%^' ) + Q('r1`')",
'w ~ Q( " x`!#%^" )']
for x in testlist:
result = replace_backticks(x)
print("="*20)
print(x)
print(result)
Output:
====================
a ~ `50%`
a ~ Q('50%')
====================
t + `x/2` = `y` + `z`
t + Q('x/2') = Q('y') + Q('z')
====================
`x%z` ~ `x!#%^`
Q('x%z') ~ Q('x!#%^')
====================
y~x-1
y~x-1
====================
y ~ Q('x`')
y ~ Q('x`')
====================
y ~ Q('`x`')
y ~ Q('`x`')
====================
w ~ Q( ' x`!#%^' ) + Q('r1`')
w ~ Q( ' x`!#%^' ) + Q('r1`')
====================
w ~ Q( " x`!#%^" )
w ~ Q( " x`!#%^" )
This is a suggestion to implement backticks as an alias for quoting
Q('...')
. E.g.:Rationale:
R
syntax allows addressing fields as: