Bikesheds - Githubissues

gvanrossum commented 2 years ago

Eventually these may each need their own issue? For now, some bullets, some with recommendations.

Syntax

Space between tag and string literal? I.e. is tag "abc" the same as tag"abc"? [rec: disallow]
- If yes, do we recommend writing a space between tag and string? I.e. do we prefer tag"abc" or tag "abc"?
Multi-string? I.e. is tag"a" "b" "c" allowed, presumably meaning tag"abc"? [rec: disallow]
Raw strings? I suppose we should disallow things like tag r"abc" -- but perhaps we should not interpret backslashes? I.e. tag "\a\b\c" would call effectively tag(r"\a\b\c"), i.e. a string of six characters.
Do we only allow a single identifier in front of the string, or do we allow a dotted name? Or perhaps any atomic expression (e.g. a parenthesized expression) or even any primary (e.g. foo().tag"abc")? [rec: single name only]
Do we allow a call, subscript or attribute after the string literal, e.g. tag "abc" [i]? (You can write f"xyz"[i].) [rec: yes]
Do we support all the special syntax allowed in f-string interpolations, i.e. {x!r}, {x!s}, {x!a}, {x = }? [rec: yes]
It definitely looks a little weird to see len"foo" (which is the same as len("foo")). I'm guessing that's why JavaScript uses backticks for tagged templates?

Semantics

Should the string we put in the thunk representing the raw string include the enclosing curly braces? [rec: it should just be the evaluatable expression]
We need to specify the name and fields of the Thunk class.
We need to specify how the tag function gets called. [rec: with a sequence of non-empty strings and Thunks]

ericvsmith commented 2 years ago

Re: Multi-string? I.e. is tag "a" "b" "c" allowed, presumably meaning tag "abc"?

f-strings do not do this. If x=3, then f"{x}" "{x}" is "3{x}".

jimbaker commented 2 years ago

Re: Space between tag and string literal? No, this is not possible with other prefixes. This also is a potential point of confusion where users may want to combine prefixes together, such as tag1 tag2"abc". This is also addresses raw strings with tag r"abc" -- all literal strings in the tagstring will be raw, and then decoded by the tag. This allows someone to write a LaTeX tag, such as latex"{member} \in X".

jimbaker commented 2 years ago

Do we support all the special syntax allowed in f-string interpolations, i.e. {x!r}, {x!s}, {x!a}, {x = }?

Yes. Note that for some text {x=}, this is rewritten at the parse stage as being the equivalent of some text x={x}. So we will need to add some extra support to recover the original raw string.

ericvsmith commented 2 years ago

Yes. Note that for some text {x=}, this is rewritten at the parse stage as being the equivalent of some text x={x}. So we will need to add some extra support to recover the original raw string.

This is not in general reversible, though. We had this same issue with stringized annotations. It didn't matter there, since we weren't concerned with complete fidelity. f'x={x!r}' and f'{x=}' generate the same AST.

>>> ast.dump(ast.parse("f'x={x!r}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='x=', kind=None), FormattedValue(value=Name(id='x', ctx=Load()), conversion=114, format_spec=None)]))], type_ignores=[])"
>>> ast.dump(ast.parse("f'{x=}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='x=', kind=None), FormattedValue(value=Name(id='x', ctx=Load()), conversion=114, format_spec=None)]))], type_ignores=[])"

jimbaker commented 2 years ago

formatspec should be an empty string '', not None. This allows for direct usage of the formatspec with

format(getvalue(), formatspec)

as opposed to requiring

format(getvalue(), '' if formatspec is None else formatspec)

Interestingly I haven't really being working with formatspec except to explore creative 😁 usages, but this is not so likely for real usage.

rmorshea commented 2 years ago

As I'm writing I'm finding the name "tag string" to be a bit awkward to use. I don't have the grammatical chops to justify it, but having two nouns right next to each other doesn't feel right to me. Given that, I think it would be better if we used the term "tagged string" as that uses an adjective to describe the string as being "tagged". This also aligns more closely with the name "f-string" which is meant to abbreviate "formatted string". For comparison, the JS equivalent is similarly called a "tagged template".

I had also been using "string tag" to refer to the thing which interpolates a "tagged string". However, I think @jimbaker has been using "tag function" and, even though it could technically be an object with a __call__ method, I think "string tag" could easily get confused for "tagged string" as you're reading. Thus, I'll be using "tag function" from now on.

jimbaker commented 2 years ago

@rmorshea So we have two choices for "tag string" per this insightful Wikipedia article on noun adjuncts, https://en.wikipedia.org/wiki/Noun_adjunct:

Use a noun adjunct construction. So this is "tag string" or possibly "tag-string" (noun adjuncts are occasionally hyphenated with the noun being modified).
Use an adjectivally inflected construction. So this is "tagged string".

I had not really thought about this aspect of English before!

rmorshea commented 2 years ago

In retrospect, I don't know why "tag function" was fine to my eye, but "tag string" was not, since they're effectively the same grammatical structure. Your original noun adjunct naming also has a nice symmetry I didn't appreciate before with the common "tag" prefix that we could use in naming other things.

rmorshea commented 2 years ago

Also, while we're on the topic of naming, where did "Thunk" come from? I've been assuming it's a combination of "tuple" and "chunk", but I haven't seen any explicit reference to that.

gvanrossum commented 2 years ago

Thunk is a very old term used for a compiler-generated piece of code representing a parameter. It dates back to Algol-60 (of which I have very fond memories). https://en.wikipedia.org/wiki/Thunk

benji-york commented 1 year ago

There was a discussion on Lobste.rs today that lead to some interesting thoughts that seem closely related to this work, so I thought I would share: https://dotat.at/@/2020-09-17-generalized-string-literal-syntax-10-years-later.html

Archmonger commented 1 year ago

Deleting/moving this comment from #20

I prefer a html @ "<div>" syntax for all tagstrs more than the proposed html"<div>" syntax.

Single letter keywords, such as f,r, and b are allowed to be visually attached to a string since they are not a PEP8 variable names. But html"..." looks like someone accidentally forgot a space between the variable html and the string literal "...".

A couple of other benefits come to mind

1) It reads better. When voicing it, it sounds like: "My HTML is at 'string'" 2) Would allow for compatibility with f-strings such as html @ f"<div>{my_val}</div> 3) Pattern feels comfortable due to the existence of email addresses trivializing the @ symbol.

jimbaker commented 1 year ago

The problem with @ is that it already has a valid parse in Python. So let's write the following:

>>> class HTML:
...   def __matmul__(self, other):
...     print(f'Multiplying {self=} with {other=}')
...     return 42
...
>>>
>>> html = HTML()
>>> html @ '<div>'
Multiplying self=<__main__.HTML object at 0x7faa5602f0d0> with other='<div>'
42

Obviously we can do something with this functionality with @, but we lose laziness, interpolation control more generally, etc.

The advantage of using the tag string approach is that it is not currently valid syntax, so we can use it in this interesting way.

Archmonger commented 1 year ago

But html'mystr' visually looks confusing/ambiguous. It's possible the Python community will have the same initial visceral reaction over it as they did with the PEP for narwhal operators :=.

rmorshea commented 1 year ago

It seems a little hyperbolic to assume that's how people will react as I think that could be true of almost any change to the language. But regardless, this syntax has the advantage that Python already has string prefixes and that Javascript's template literals do the same, but with backticks. I'm not sure that we ruled out the possibility of using backticks, but that's another way this could be made visually distinct from normal string declarations.

gvanrossum commented 1 year ago

Could we update this issue with (tentative) decisions made on various issues brought up above?

Or perhaps close this in favor of more pointed issues for the remaining open issues? E.g. \N{...}. We already have separate issues about some points, e.g. #4, #5.

jimbaker commented 1 year ago

Yes, I will take care of that, along with other issues. There's too much that's been left open, and we can always reopen any if it comes up.

arogozhnikov commented 1 year ago

Following this discussion for f-strings, it worth documenting that tagged strings can't be used as docstrings.

gvanrossum commented 1 year ago

Following this discussion for f-strings, it worth documenting that tagged strings can't be used as docstrings.

Do you have a use case?

arogozhnikov commented 1 year ago

Do you have a use case?

Maybe:

trivial checks for right names in docstrings, to ease refactoring, or maybe better links:

def repeat(string, n_repetitions: int):
   doc'''
       {string} - template
       {n_repetitions} - number of times to repeat
   # these are turned into links in documentation
   See also: {repeat_twice} and {string_utils}
   '''
   return string * n_repetitions

conditional reformatting (e.g. rest docstrings produce ok documentation, but are not readable IMO). If a special flag passed, a ReST can be produced
(if supported) syntax highlight for in-doc examples

Neither are really important.

Jacob-Flasheye commented 9 months ago

~~Hi, I stumbled upon this repo and I think the features proposed in this PEP are nice, but I have some thoughts and questions. I hope this is the right place for them :)~~

My understanding if the word "Thunk" is that it means any deferred computation unit (at least that's how I understand it from the little Haskell I've done.) It seems a shame to use the name of such a generic concept on this very specific use case. Maybe Thunk should be some sort of base class and the Thunks in the PEP could be subclasses?

And related to the last syntax point in the OP, I also find foobar"a b c" a bit jarring, and the syntax in PEP 501 looks more natural (in Python at least) to me. Why did you decide to go with PEP 501 syntax (foobar(i"a {b} c"))?

I'm also a bit concerned that this is essentially just another way of evaluating a function, with double quotes instead of parentheses. like wouldn't foobar(a, b, c) and foobar"{a}{b}{c}" be equivalent? Continuing that though, wouldn't this essentially give people all the power of lazy evaluation, just with a funky function call, thus potentially making it used for non-string things? I don't think that's the purpose of the PEP but it seems like a natural consequence to me. But maybe there's something I'm not understanding?

EDIT: @merwok explained that I should go somewhere else with my questions, editing the post so I (hopefully) don't disturb more people than I already have!

merwok commented 9 months ago

The right place would be here I think: https://discuss.python.org/t/allow-for-arbitrary-string-prefix-of-strings/19740/6

This tracker (I think) is used to work out some issues between the people working on the proposal, and this ticket specifically is for some cosmetic details («bikesheds») and not the project in general.

Hope this helps!

jimbaker / tagstr

Bikesheds #3

Syntax

Semantics