Closed rminehan closed 8 months ago
Another view from REPL
scala> val s"Hello\t$name" = "Hello\tJames" // print
val name: String = ("Hello\tJames": @scala.unchecked) match {
case scala.StringContext.apply("Hello\\t", "").s.unapplySeq((name @ _)) => name
}
That much is per spec.
The expansion of parts is shown there in triple-quotes.
The extractor calls itself "the simple string matcher". Maybe that is better than "dual". Or perhaps it is a "duel" to see which is simpler.
Perhaps it was an inadvertent omission.
I neglected to mention for the public record that it is clearly the "standard" interpolator.
The dual is the standard extirpolator.
Edit: the jokey "extirpolator" sounded funnier at midnight. Perhaps we are stuck with "extrapolator" as a canonical label.
@rminehan thanks for the thorough report. re: naming, one approach is to just call it "thes
interpolator" 🤷 At some level it's just a method and not a "feature"; string interpolation is the feature, individual interpolators are methods.
The obvious follow-up will be the raw""
extrapolator, when library evolution permits.
The less obvious follow-up would be the f""
extrapolator for typesafe scanf
.
Hi @som-snytt and @SethTisue , thanks for the replies.
@som-snytt
That much is per spec.
The expansion of parts is shown there in triple-quotes.
I want to clarify I have understood you correctly. You're saying that the formal spec dictates that a code snippet:
id"text0{ pat1 }text1 … { patn }textn"
is transformed to:
StringContext("""text0""", …, """textn""").id(pat1, …, patn)
hence in your example:
val s"Hello\t$name" = "Hello\tJames"
the s"Hello\t$name"
leads to a context:
StringContext.apply("""Hello\t""", """""")
// equivalent to
StringContext.apply("Hello\\t", "")
So the tab character is effectively split into two runtime characters because it is represented as two separate characters in the original source code.
So these are roughly equivalent programs:
// Original code
val s"Hello\t$name" = "Hello\tJames"
// Equivalent expanded program
val interpolator = scala.StringContext("""Hello\t""", """""").s
val name = "Hello\tJames" match {
case interpolator(x) => x
}
// scala.MatchError: Hello James (of class java.lang.String)
If my understanding is correct, then that would mean that the "bug" is in the implementation of the glob
method. It should be handling cases where there's backslashes in strings from patternChunks
.
But do you think that when the original spec was written, it had taken into consideration escaped characters? I worry that it was not considered.
It feels more natural to me that the handling of backslash characters would be done when the string context is created, such that the parts in the context capture escaped characters as a single runtime character. If extractors are made for other interpolators (e.g. f""
), then that implementation will have to handle the same problem.
But there might be other cases I'm not aware of, e.g. impacts on raw""
.
If you guys think that the root cause of the issue is in the glob
method, then I can look into fixing it. That's a pretty self-contained task and it's easy to test and verify.
Thanks, I already submitted a PR, see "May be fixed by", above. I did not intend to steal work from you. You can contribute further by reviewing whether the fix makes sense to you.
I added escape processing before the call to glob
, which may be used by clients which don't want escapes processed.
How to handle escapes is up to the interpolator, which can have arbitrary behavior.
Hi @som-snytt , sorry I didn't see your message earlier, thanks for much for fixing it!
Reproduction steps
Scala version: 2.13 onwards (including scala 3) for Java 8.
Problem
In both cases, the expected behavior is that
name
would be bound to "James", but instead aMatchError
is thrown.See also original SO post where this was raised. This happens when other escaped characters are involved like "\", "\"", "\r".
Background
This dual interpolation feature was introduced into scala 2.13 by PR 7387.
It added an
s.unapply
method for destructuring strings based on interpolation patterns. The underlying implementation for that is theglob
method which I analyze below.Investigation
I have done some preliminary investigation and I believe the root cause is that the
parts
produced in the examples above are representing an escaped character as two separate characters, e.g. a tab character becomes a backslash and a "t". But theglob
function introduced in the PRFor example code like this:
leads to a
StringContext
being constructed with parts:The
glob
method is effectively called with this data:Because it returns
None
, thes.unapplySeq
returnsNone
and the pattern match fails leading to aMatchError
.I verified this using a debugger and setting a breakpoint inside the glob method.
When
glob
is passed the expected data, aSome
is returned as expected:So I think the "bug" is either:
StringContext
parts aren't being produced correctly with regards to escaped charactersStringContext
parts are being produced correctly with regards to escaped characters, butglob
incorrectly assumes they'll be a single characterOther notes
The behavior appears to be consistent for any scala version from 2.13+ and the java version isn't relevant.
On my SO post, I asked if this feature has an official name. That would be helpful going forward for documenting this and doing searches. For now I'm calling it the "dual string interpolator".