Open gvanrossum opened 2 years ago
Re: Multi-string? I.e. is tag "a" "b" "c" allowed, presumably meaning tag "abc"?
f-strings do not do this. If x=3, then f"{x}" "{x}" is "3{x}".
Re: Space between tag and string literal? No, this is not possible with other prefixes. This also is a potential point of confusion where users may want to combine prefixes together, such as tag1 tag2"abc"
. This is also addresses raw strings with tag r"abc"
-- all literal strings in the tagstring will be raw, and then decoded by the tag. This allows someone to write a LaTeX tag, such as latex"{member} \in X"
.
Do we support all the special syntax allowed in f-string interpolations, i.e.
{x!r}
,{x!s}
,{x!a}
,{x = }
?
Yes. Note that for some text {x=}
, this is rewritten at the parse stage as being the equivalent of some text x={x}
. So we will need to add some extra support to recover the original raw string.
Yes. Note that for
some text {x=}
, this is rewritten at the parse stage as being the equivalent ofsome text x={x}
. So we will need to add some extra support to recover the original raw string.
This is not in general reversible, though. We had this same issue with stringized annotations. It didn't matter there, since we weren't concerned with complete fidelity. f'x={x!r}'
and f'{x=}'
generate the same AST.
>>> ast.dump(ast.parse("f'x={x!r}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='x=', kind=None), FormattedValue(value=Name(id='x', ctx=Load()), conversion=114, format_spec=None)]))], type_ignores=[])"
>>> ast.dump(ast.parse("f'{x=}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='x=', kind=None), FormattedValue(value=Name(id='x', ctx=Load()), conversion=114, format_spec=None)]))], type_ignores=[])"
formatspec
should be an empty string ''
, not None
. This allows for direct usage of the formatspec
with
format(getvalue(), formatspec)
as opposed to requiring
format(getvalue(), '' if formatspec is None else formatspec)
Interestingly I haven't really being working with formatspec
except to explore creative 😁 usages, but this is not so likely for real usage.
As I'm writing I'm finding the name "tag string" to be a bit awkward to use. I don't have the grammatical chops to justify it, but having two nouns right next to each other doesn't feel right to me. Given that, I think it would be better if we used the term "tagged string" as that uses an adjective to describe the string as being "tagged". This also aligns more closely with the name "f-string" which is meant to abbreviate "formatted string". For comparison, the JS equivalent is similarly called a "tagged template".
I had also been using "string tag" to refer to the thing which interpolates a "tagged string". However, I think @jimbaker has been using "tag function" and, even though it could technically be an object with a __call__
method, I think "string tag" could easily get confused for "tagged string" as you're reading. Thus, I'll be using "tag function" from now on.
@rmorshea So we have two choices for "tag string" per this insightful Wikipedia article on noun adjuncts, https://en.wikipedia.org/wiki/Noun_adjunct:
I had not really thought about this aspect of English before!
In retrospect, I don't know why "tag function" was fine to my eye, but "tag string" was not, since they're effectively the same grammatical structure. Your original noun adjunct naming also has a nice symmetry I didn't appreciate before with the common "tag" prefix that we could use in naming other things.
Also, while we're on the topic of naming, where did "Thunk" come from? I've been assuming it's a combination of "tuple" and "chunk", but I haven't seen any explicit reference to that.
Thunk is a very old term used for a compiler-generated piece of code representing a parameter. It dates back to Algol-60 (of which I have very fond memories). https://en.wikipedia.org/wiki/Thunk
There was a discussion on Lobste.rs today that lead to some interesting thoughts that seem closely related to this work, so I thought I would share: https://dotat.at/@/2020-09-17-generalized-string-literal-syntax-10-years-later.html
Deleting/moving this comment from #20
I prefer a html @ "<div>"
syntax for all tagstrs more than the proposed html"<div>"
syntax.
Single letter keywords, such as f
,r
, and b
are allowed to be visually attached to a string since they are not a PEP8 variable names. But html"..."
looks like someone accidentally forgot a space between the variable html
and the string literal "..."
.
A couple of other benefits come to mind
1) It reads better. When voicing it, it sounds like: "My HTML is at 'string'
"
2) Would allow for compatibility with f-strings such as html @ f"<div>{my_val}</div>
3) Pattern feels comfortable due to the existence of email addresses trivializing the @
symbol.
The problem with @
is that it already has a valid parse in Python. So let's write the following:
>>> class HTML:
... def __matmul__(self, other):
... print(f'Multiplying {self=} with {other=}')
... return 42
...
>>>
>>> html = HTML()
>>> html @ '<div>'
Multiplying self=<__main__.HTML object at 0x7faa5602f0d0> with other='<div>'
42
Obviously we can do something with this functionality with @
, but we lose laziness, interpolation control more generally, etc.
The advantage of using the tag string approach is that it is not currently valid syntax, so we can use it in this interesting way.
But html'mystr'
visually looks confusing/ambiguous. It's possible the Python community will have the same initial visceral reaction over it as they did with the PEP for narwhal operators :=
.
It seems a little hyperbolic to assume that's how people will react as I think that could be true of almost any change to the language. But regardless, this syntax has the advantage that Python already has string prefixes and that Javascript's template literals do the same, but with backticks. I'm not sure that we ruled out the possibility of using backticks, but that's another way this could be made visually distinct from normal string declarations.
Could we update this issue with (tentative) decisions made on various issues brought up above?
Or perhaps close this in favor of more pointed issues for the remaining open issues? E.g. \N{...}
. We already have separate issues about some points, e.g. #4, #5.
Yes, I will take care of that, along with other issues. There's too much that's been left open, and we can always reopen any if it comes up.
Following this discussion for f-strings, it worth documenting that tagged strings can't be used as docstrings.
Following this discussion for f-strings, it worth documenting that tagged strings can't be used as docstrings.
Do you have a use case?
Do you have a use case?
Maybe:
def repeat(string, n_repetitions: int):
doc'''
{string} - template
{n_repetitions} - number of times to repeat
# these are turned into links in documentation
See also: {repeat_twice} and {string_utils}
'''
return string * n_repetitions
Neither are really important.
Hi, I stumbled upon this repo and I think the features proposed in this PEP are nice, but I have some thoughts and questions. I hope this is the right place for them :)
My understanding if the word "Thunk" is that it means any deferred computation unit (at least that's how I understand it from the little Haskell I've done.) It seems a shame to use the name of such a generic concept on this very specific use case. Maybe Thunk
should be some sort of base class and the Thunk
s in the PEP could be subclasses?
And related to the last syntax point in the OP, I also find foobar"a b c"
a bit jarring, and the syntax in PEP 501 looks more natural (in Python at least) to me. Why did you decide to go with PEP 501 syntax (foobar(i"a {b} c")
)?
I'm also a bit concerned that this is essentially just another way of evaluating a function, with double quotes instead of parentheses. like wouldn't foobar(a, b, c)
and foobar"{a}{b}{c}"
be equivalent? Continuing that though, wouldn't this essentially give people all the power of lazy evaluation, just with a funky function call, thus potentially making it used for non-string things? I don't think that's the purpose of the PEP but it seems like a natural consequence to me. But maybe there's something I'm not understanding?
EDIT: @merwok explained that I should go somewhere else with my questions, editing the post so I (hopefully) don't disturb more people than I already have!
The right place would be here I think: https://discuss.python.org/t/allow-for-arbitrary-string-prefix-of-strings/19740/6
This tracker (I think) is used to work out some issues between the people working on the proposal, and this ticket specifically is for some cosmetic details («bikesheds») and not the project in general.
Hope this helps!
Eventually these may each need their own issue? For now, some bullets, some with recommendations.
Syntax
tag "abc"
the same astag"abc"
? [rec: disallow]tag"abc"
ortag "abc"
?tag"a" "b" "c"
allowed, presumably meaningtag"abc"
? [rec: disallow]tag r"abc"
-- but perhaps we should not interpret backslashes? I.e.tag "\a\b\c"
would call effectivelytag(r"\a\b\c")
, i.e. a string of six characters.foo().tag"abc"
)? [rec: single name only]tag "abc" [i]
? (You can writef"xyz"[i]
.) [rec: yes]{x!r}
,{x!s}
,{x!a}
,{x = }
? [rec: yes]len"foo"
(which is the same aslen("foo")
). I'm guessing that's why JavaScript uses backticks for tagged templates?Semantics
Thunk
class.