Open oprypin opened 3 months ago
Let me reiterate my main point very briefly:
Bazel and Python:
repr([foo]) == '[' + repr(foo) + ']'
starlark-go (and I consider this basically a bug):
repr([foo]) == '[' + str(foo) + ']'
Let me reiterate my main point very briefly: Bazel and Python: repr([foo]) == '[' + repr(foo) + ']'
Thanks, I didn't realize that. Looking at the Starlark spec, the precise behavior of str and repr for each type does seem to be underspecified. That said, I'm not sure we can reasonably change the behavior of repr for built-in types at this point; we could ask our friends on the Bazel team (@brandjon @tetromino) to evaluate how much damage this change would cause in Google's large corpus.
For user-defined types, we could define a Reprable interface, analogous to fmt.GoStringer method, which would allows a type to define a variant string representation for Printf("%#v", x)
. (Perhaps it would even make sense to identify these two, so that the host Go application could choose among str or repr semantics in a printf expression.)
But before we add more API, I would like to have a more coherent understanding of the theory of repr vs str and any invariants that we should be striving to establish or maintain.
Thanks for consideration.
I think this is possible to handle without necessarily changing anything, only clarifying the spec and adding an override point in the API.
I was also wrong to directly call this a bug because the main invariant that I named does hold in starlark-go for all built-in types. It's just not specified whether .String()
is supposed to mean repr
or str
, instead it backs both.
The current state is that .String()
backs the implementation of repr
and backs the implementation of str
except for two string types.
This actually suffices to fully correctly cover all of the current built-in types in starlark-go because in fact only the string types need this different treatment.
I think this is the summary of the philosophy of Python:
print(foo)
applies str()
on the object first.
print()
does to get a string matches whatever the implementation of str()
does. For custom types this falls back to .String()
.Implementers of repr()
should apply repr()
on all nested objects. I.e. repr([foo]) == '[' + repr(foo) + ']'
.
writeValue
has specific handling for strings. For other types this falls back to .String()
.For most objects, str()
is just the same as repr()
. I.e. str([foo]) == repr([foo]) == '[' + repr(foo) + ']'
.
But some objects do want to have distinct behaviors, when there's a useful distinction between "human-readable" str
and "machine-readable" repr
.
The first step might be to formalize the above (because it does fully hold at the moment)
and formalize the situation with .String()
in one of two ways:
a. .String()
backs the implementation of str
for an object, and repr
just does the same and is impossible to override.
b. .String()
backs the implementation of repr
for an object and str
just does the same and is impossible to override.
(I actually like this one better because it also currently fully holds - repr
does fully correspond to .String()
, it's only str
that has special cases, which by the way could be generalized in the implementation.)
And the next step would be to add the ability to override the latter one through a new interface.
One more thing regarding this:
Implementers of
repr()
should applyrepr()
on all nested objects.
For any implementers of custom types, this surely had to hold true as the following:
.String()
should apply .String()
on all nested objects"(because what else were they supposed to apply?)
So this makes it even more attractive to define the existing .String()
as the backing implementation of repr
and making str
get a new override ability, rather than the other way around.
That way, not only the built-in types but also custom types will be conformant retroactively.
Also important to mention: '%s'
and '%r'
as well as '{!s}'
and '{!r}'
need to be backed directly by str
and repr
respectively. Which is correctly implemented and documented in the spec, just similarly not extensible by custom types.
Also important to mention:
'%s'
and'%r'
as well as'{!s}'
and'{!r}'
need to be backed directly bystr
andrepr
respectively. Which is correctly implemented and documented in the spec, just similarly not extensible by custom types.
Right, but those delegations are straightforward and unambiguous, and orthogonal to the question of the desired behavior of str and repr.
I have implemented my suggestion - it's actually really straightforward and doesn't modify any existing behavior or API (no effect on unittests), just replaces the checks for "is it the str
type" with "does it implement the new interface" in the places that should be backed by str()
behavior per the spec.
https://github.com/oprypin/starlark-go/commit/544984fc13612299ab6c373a6f67a8cbb20dcf4e
Of course no obligation to do anything with this code, it just also illustrates my proposal well. But do let me know if I should open a pull request. Any suggestions for how to name the interface are very welcome; consider the current naming mostly a placeholder.
Mm I should note - the commit does affect the behavior of the bytestring type in some cases. I was debating back and forth whether it should be string-like in this context because this distinction is applied inconsistently at the moment. This may need to be formalized first as well.
So pushed an alternate commit that doesn't touch bytestrings and should actually have 0 effect on existing behavior.
https://github.com/oprypin/starlark-go/commit/7b230293571a1d17f28fae4cea6a7f30109f638e
Bazel's Starlark (very similarly to Python) has a separation between
str
andrepr
and consistently usesrepr
for nested objects.For example, this always holds for lists, no matter what
foo
is:However in starlark-go
str
andrepr
are backed by the exact same interface:With the only 2 exceptions to this being hardcoded - only strings and bytestrings use a different code path inside
str
that doesn't simply usevalue.String()
.So, except for strings, it is basically
repr([foo]) == '[' + str(foo) + ']'
This can lead to some limitations and confusion.
Consider a
Label
object like in Bazel, per issue #535.For Bazel compatibility, the following would need to hold:
print(Label("//:hi"))
prints//:hi
print([Label("//:hi")])
prints[Label("//:hi")]
With starlark-go, I am pretty sure there is only one shared implementation allowed for
str
andrepr
so I would have to choose one of the below options if I were to try to implement such aLabel
, both very suboptimal:print(Label("//:hi"))
printsLabel("//:hi")
print([Label("//:hi")])
prints[Label("//:hi")]
print(Label("//:hi"))
prints//:hi
print([Label("//:hi")])
prints[//:hi]
Actually we can simplify this even further and don't need to talk about "labels". It is not even possible to implement an object like the native
str
. You are forced to choose between one of these suboptimal options:print(MyStr("hi"))
prints"hi"
- badprint([MyStr("hi")])
prints["hi"]
- goodprint(MyStr("hi"))
printshi
- goodprint([MyStr("hi")])
prints[hi]
- badI think at the very least, the
repr
-like behavior would need to be chosen for the list case, rather than thestr
-like behavior, even if the two aren't allowed to have a separate interface that backs them.