Distinguish `repr` away from `str` and use it for nested types

oprypin commented 3 months ago

Bazel's Starlark (very similarly to Python) has a separation between str and repr and consistently uses repr for nested objects.

For example, this always holds for lists, no matter what foo is:

repr([foo]) == '[' + repr(foo) + ']'

However in starlark-go str and repr are backed by the exact same interface:

With the only 2 exceptions to this being hardcoded - only strings and bytestrings use a different code path inside str that doesn't simply use value.String().

So, except for strings, it is basically repr([foo]) == '[' + str(foo) + ']'

This can lead to some limitations and confusion.

Consider a Label object like in Bazel, per issue #535.

For Bazel compatibility, the following would need to hold:

print(Label("//:hi")) prints //:hi
print([Label("//:hi")]) prints [Label("//:hi")]

With starlark-go, I am pretty sure there is only one shared implementation allowed for str and repr so I would have to choose one of the below options if I were to try to implement such a Label, both very suboptimal:

- print(Label("//:hi")) prints Label("//:hi")
- print([Label("//:hi")]) prints [Label("//:hi")]
- print(Label("//:hi")) prints //:hi
- print([Label("//:hi")]) prints [//:hi]

Actually we can simplify this even further and don't need to talk about "labels". It is not even possible to implement an object like the native str. You are forced to choose between one of these suboptimal options:

- print(MyStr("hi")) prints "hi" - bad
- print([MyStr("hi")]) prints ["hi"] - good
- print(MyStr("hi")) prints hi - good
- print([MyStr("hi")]) prints [hi] - bad

I think at the very least, the repr-like behavior would need to be chosen for the list case, rather than the str-like behavior, even if the two aren't allowed to have a separate interface that backs them.

oprypin commented 3 months ago

Let me reiterate my main point very briefly:

Bazel and Python:
```
repr([foo]) == '[' + repr(foo) + ']'
```
starlark-go (and I consider this basically a bug):
```
repr([foo]) == '[' + str(foo) + ']'
```

adonovan commented 3 months ago

Let me reiterate my main point very briefly: Bazel and Python: repr([foo]) == '[' + repr(foo) + ']'

Thanks, I didn't realize that. Looking at the Starlark spec, the precise behavior of str and repr for each type does seem to be underspecified. That said, I'm not sure we can reasonably change the behavior of repr for built-in types at this point; we could ask our friends on the Bazel team (@brandjon @tetromino) to evaluate how much damage this change would cause in Google's large corpus.

For user-defined types, we could define a Reprable interface, analogous to fmt.GoStringer method, which would allows a type to define a variant string representation for Printf("%#v", x). (Perhaps it would even make sense to identify these two, so that the host Go application could choose among str or repr semantics in a printf expression.)

But before we add more API, I would like to have a more coherent understanding of the theory of repr vs str and any invariants that we should be striving to establish or maintain.

oprypin commented 3 months ago

Thanks for consideration. I think this is possible to handle without necessarily changing anything, only clarifying the spec and adding an override point in the API. I was also wrong to directly call this a bug because the main invariant that I named does hold in starlark-go for all built-in types. It's just not specified whether .String() is supposed to mean repr or str, instead it backs both.

The current state is that .String() backs the implementation of repr and backs the implementation of str except for two string types. This actually suffices to fully correctly cover all of the current built-in types in starlark-go because in fact only the string types need this different treatment.

I think this is the summary of the philosophy of Python:

print(foo) applies str() on the object first.
- In starlark-go this is not part of the spec but for all built-in types whatever the implementation of print() does to get a string matches whatever the implementation of str() does. For custom types this falls back to .String().
Implementers of repr() should apply repr() on all nested objects. I.e. repr([foo]) == '[' + repr(foo) + ']'.
- In starlark-go this is not part of the spec but again this happens to perfectly hold in all cases with builtin types because writeValue has specific handling for strings. For other types this falls back to .String().
For most objects, str() is just the same as repr(). I.e. str([foo]) == repr([foo]) == '[' + repr(foo) + ']'. But some objects do want to have distinct behaviors, when there's a useful distinction between "human-readable" str and "machine-readable" repr.

The first step might be to formalize the above (because it does fully hold at the moment) and formalize the situation with .String() in one of two ways:

a. .String() backs the implementation of str for an object, and repr just does the same and is impossible to override.

b. .String() backs the implementation of repr for an object and str just does the same and is impossible to override. (I actually like this one better because it also currently fully holds - repr does fully correspond to .String(), it's only str that has special cases, which by the way could be generalized in the implementation.)

And the next step would be to add the ability to override the latter one through a new interface.

oprypin commented 3 months ago

One more thing regarding this:

Implementers of repr() should apply repr() on all nested objects.

For any implementers of custom types, this surely had to hold true as the following:

"Implementers of .String() should apply .String() on all nested objects"

(because what else were they supposed to apply?)

So this makes it even more attractive to define the existing .String() as the backing implementation of repr and making str get a new override ability, rather than the other way around. That way, not only the built-in types but also custom types will be conformant retroactively.

oprypin commented 3 months ago

Also important to mention: '%s' and '%r' as well as '{!s}' and '{!r}' need to be backed directly by str and repr respectively. Which is correctly implemented and documented in the spec, just similarly not extensible by custom types.

adonovan commented 3 months ago

Also important to mention: '%s' and '%r' as well as '{!s}' and '{!r}' need to be backed directly by str and repr respectively. Which is correctly implemented and documented in the spec, just similarly not extensible by custom types.

Right, but those delegations are straightforward and unambiguous, and orthogonal to the question of the desired behavior of str and repr.

oprypin commented 3 months ago

I have implemented my suggestion - it's actually really straightforward and doesn't modify any existing behavior or API (no effect on unittests), just replaces the checks for "is it the str type" with "does it implement the new interface" in the places that should be backed by str() behavior per the spec.

https://github.com/oprypin/starlark-go/commit/544984fc13612299ab6c373a6f67a8cbb20dcf4e

Of course no obligation to do anything with this code, it just also illustrates my proposal well. But do let me know if I should open a pull request. Any suggestions for how to name the interface are very welcome; consider the current naming mostly a placeholder.

oprypin commented 3 months ago

Mm I should note - the commit does affect the behavior of the bytestring type in some cases. I was debating back and forth whether it should be string-like in this context because this distinction is applied inconsistently at the moment. This may need to be formalized first as well.

oprypin commented 3 months ago

So pushed an alternate commit that doesn't touch bytestrings and should actually have 0 effect on existing behavior.

https://github.com/oprypin/starlark-go/commit/7b230293571a1d17f28fae4cea6a7f30109f638e

google / starlark-go

Distinguish `repr` away from `str` and use it for nested types #536