unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
233 stars 34 forks source link

Allow names to start with a digit #350

Closed aphillips closed 1 year ago

aphillips commented 1 year ago

Is your feature request related to a problem? Please describe. Currently names can start with a potpourri of characters:

name-start = ALPHA / "_"
           / %xC0-D6 / %xD8-F6 / %xF8-2FF
           / %x370-37D / %x37F-1FFF / %x200C-200D
           / %x2070-218F / %x2C00-2FEF / %x3001-D7FF
           / %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF

... but ASCII digits are not permitted. Note that the above ranges include digits in a variety of writing systems, including the wide compatibility digits starting at 0xFF21. Just not ASCII digits.

This could be a compatibility problem. Existing MF1 messages can use numbered names, i.e. this is valid:

You have {0} foo on {1}

But this is not valid MF2:

You have {$0} foo on {$1}

Describe the solution you'd like Add DIGIT to the name-start list (or actually convert ALPHA to alnum)

Describe why your solution should shape the standard This is a restriction in the standard and thus cannot be part of userland.

Additional context or examples Use of numbered replacements is super-common in existing messaging schemes, including printf type syntaxes and the existing message format. While we require users to add our sigals and decorations when converting, the omission of digits at name start requires developers to go beyond that and actually name all of their replacement variables.

Implementations that allow auto-numbered arg lists (similar to MF1) would be seriously inconvenienced by having to change to names-in-a-map. (I had one of these implementations)

I cannot think of a reason why numbers are not allowed? They don't appear to present a parsing hazard of any kind and the name production is always marked with a sigal ($ or : etc.) anyway.

zbraniecki commented 1 year ago

I think position based arguments are inherently flawed model as they provide no value in modern programming languages while they impair error recovery.

The argument in this comment is about ability to migrate from MF1 to MF2 and I think this could be handled by migrating positional arguments to named using X -> argX transform: You have {arg0} foo on {arg1}.

I claim this is more readable in case of unresolved arguments than $0 and vastly less confusing than You have {0} unread emails. (do I have 0? Or do I have many, but MF failed?)

alerque commented 1 year ago

Is the enhancement request here to:

  1. allow processing positional arguments, or
  2. to allow named arguments to use bare numbers as names so they look like positional arguments and ease transition of MF1 data, or
  3. allow names that start with digits like 1foo or 16bar but are not just integer keys?

The title suggests β„–3, the body of the issue sounds like β„–2 to me, and the weight of @zbraniecki's point seems to be directed at β„–1.

zbraniecki commented 1 year ago

I believe my argument is directed at (1) and (2).

aphillips commented 1 year ago

@alerque The enhancement request here is focused on (2) and (3) and is strictly limited to the syntax.

I might separately suggest that migrating implementations could choose to support positional arguments (perhaps outside the standard) as a sop to migration. I have no allergy to a user choosing to have an integer named key. In general I favor putting the least restriction on developers that we can and trusting them to decide how to use our tools.

I would not have phrased (2) that way. I would have just said:

  1. to allow named arguments to use bare numbers as names (which might look like positional arguments or which might have other meaning to the developer)

In part because your (3) implies that there has to be some non-digit characters somewhere in the name. I am focused on namespacing rules here and the fewer special ones we have, the better, in my mind.

@zbraniecki Implementations of MessageFormat.format(pattern, objArrArgs) can turn the object array into a map of numerically-named values internally. The pattern still has to be migrated, but the code doesn't have to be rewritten. It's a tough sell for the developer to have to revisit all of their code (which has working messages) just to get the latest formatter.

Yes, the implementer could create a convention like arg0, but we already have a sigal for arg names (so why not $0?) [Note that this is how ARB handled the transition]

I think position based arguments are inherently flawed model as they provide no value in modern programming languages while they impair error recovery.

This is overstating it, I think? I am not arguing, please note, that positional arguments are a Good Thing or that that are equal to named arguments. Use of positional arguments in MF has been a matter of taste for a while--some developers have "bad taste" πŸ˜ƒ. I'm just reluctant to be absolutist ("no value"?)

zbraniecki commented 1 year ago

@zbraniecki Implementations of MessageFormat.format(pattern, objArrArgs) can turn the object array into a map of numerically-named values internally.

That seems like a very convoluted solution. You're saying implementations may optionally chose to replace 0 argument with arg0, or they may not. That introduces inconsistencies in areas like message references, and in error recovery.

Yes, the implementer could create a convention like arg0, but we already have a sigal for arg names (so why not $0?)

As I stated in my previous comment:

I claim this is more readable in case of unresolved arguments than $0 and vastly less confusing than You have {0} unread emails. (do I have 0? Or do I have many, but MF failed?)

You did not respond to that claim.

This is overstating it, I think?

I disagree.

Use of positional arguments in MF has been a matter of taste for a while--some developers have "bad taste" πŸ˜ƒ. I'm just reluctant to be absolutist ("no value"?)

It's not a matter of taste. It has an actual impact on the localization system capabilities in the area of error recovery. Saying it's "a matter of taste" is like arguing that the use of source string as an identifier in gettext is "a matter of taste" while it actually has objective negative impact on system maintainability, localization capabilities and so on.

In this case, use of 0 or $0 as an argument id, in a system which uses argument ids to provide improved runtime error recovery leads to confusing output.

Let me try again. Compare a user story of a non-tech-savvy customer seeing two versions of partially resolved message:

Map

let pattern = "{Accept to pay {$amount} for your order.}";
let args = {
  "amount": new Intl.MessageFormat.arguments.Currency(42, {currency: "USD"}),
};

let mf2 = new Intl.MessageFormat(["en-US"]);
button.textContent = mf2.formatToString(source, args);

Partial output:

Accept to pay {$amount} for your order.

Array

let pattern = "{Accept to pay {$0} for your order.}";
let args = [
  new Intl.MessageFormat.arguments.Currency(42, {currency: "USD"}),
];

let mf2 = new Intl.MessageFormat(["en-US"]);
button.textContent = mf2.formatToString(source, args);

Partial output:

Accept to pay {$0} for your order.

I claim that this is an significant difference in UX and we should put effort to strengthen the best practice.


My solution to make MF1 -> MF2 converter replace positional arguments with argX leads to the following partial output:

Accept to pay {$arg0} for your order.

which I believe to be significantly less confusing and less likely to be misread as 0 dollars.

As to the cost for developers that have to migrate their arrays to maps - they don't. They can write a wrapper (or we can provide them a wrapper) which takes a list of arguments and turns it into a MF2ArgumentMap with keys arg0, arg1 etc.

The additional value of that is that it provides them with greppable way to find use cases of that in their source code and set forth a project to remove the use of such transitionary wrapper by manually converting use of it to actual maps with meaningful argument names.

aphillips commented 1 year ago

@zbraniecki noted:

That seems like a very convoluted solution. You're saying implementations may optionally chose to replace 0 argument with arg0, or they may not. That introduces inconsistencies in areas like message references, and in error recovery.

Actually, when we did this for ARB, we implemented it by making positional arguments into map keys, e.g. args[0] => map.put('0', args[0]), since the resource syntax didn't care. You can't call the formatter both ways (with a Map and an array) so conflicts don't really arise.

So I'm not saying that implementations would "optionally choose". I'm saying "implementations would choose how to convert an array into keys", which could include either $argX or $X (but not both). In ARB we choose the latter because we had an installed base of simple (non-choice/non-select) messages that already used positional notation (the complex messages worked too)

I claim this is more readable in case of unresolved arguments than $0 and vastly less confusing than You have {0} unread emails. (do I have 0? Or do I have many, but MF failed?)

You did not respond to that claim.

I didn't because I didn't think it was relevant to this request and because to me it is an eye-of-the-beholder problem. Does it matter if the failed message is You have {$0} messages vs. You have {$arg0} messages? Is one better? Probably. But I don't see the fallback error state as the most important. As a developer, I can debug either one about equally. As an end-user they are both horked.

I see your user case above and I agree with you that $amount is better. I am not saying positional arguments are good or comparable to named arguments. I am only saying that I want to allow key names which start with (and potentially only contain) digits.

Admittedly I'm pointing out that one use for these is migrating MF1 callers as a kind of "proof of utility"...

Use of positional arguments in MF has been a matter of taste for a while--some developers have "bad taste" πŸ˜ƒ. I'm just reluctant to be absolutist ("no value"?)

It's not a matter of taste. It has an actual impact on the localization system capabilities in the area of error recovery.

Well, no, it is a matter of preference in MF1 (and flavors of MF1, such as ARB), where one can choose between using a Map and an array--and some developers prefer one or the other. That's why I said it was a matter of taste.

I do know that developers are not keen on being required to go into working code and change this:

final Object[] args = {  numMessages, priceOrder, foo };
return res.format("somePatternString", args); // ARB combines resource lookup and formatting

into this:

final ImmutableMap<String,Object> args = ImmutableMap.of(
                                           "numMessages", numMessages,
                                           "priceOrder", priceOrder,
                                           "foo", foo);
return res.format("somePatternString", args);

Just so they can use the new formatter. They'll just... call the old one, eh? If the only cost for the new formatter is the need to fix the source pattern (perhaps using a tool) hidden behind somePatternString, they might do that.

Saying it's "a matter of taste" is like arguing that the use of source string as an identifier in gettext is "a matter of taste" while it actually has objective negative impact on system maintainability, localization capabilities and so on.

But, again, it is a matter of developer preference. I fully agree that using the source string as a gettext key is bad--but some developers (not me!) think it is actually a feature (this includes, I believe, the developers of gettext). Even if it never occurred to the developers of gettext, they never prohibited it and someone discovered they could make it work.

I don't think we have to enforce every best practice at the level of the syntax and I'm willing to let implementers or users with different priorities than my own make their own choices. As far as I can tell, there is no technical reason to disallow numeric keys. Adding positional argument support is not something I think we want to do (and am not proposing it), but I could see existing implementations wanting to provide a migration path. Recommending the argX solution to implementers in the user guide would be fine by me...

zbraniecki commented 1 year ago

But I don't see the fallback error state as the most important.

I think that's the source of our disagreement. I see resilience as an important part of a dynamic system's design which targets cross-roads of technologists and non-technologists to collaboratively produce human readable output. [0]

I am only saying that I want to allow key names which start with (and potentially only contain) digits.

And I am only saying that I'd prefer to force such key names to use arg[0-9]+ convention.

I do know that developers are not keen on being required to go into working code and change this:

They wouldn't be, which is an area where I think we do agree. I suggest that such case would require just a helper wrapper from your:

final Object[] args = {  numMessages, priceOrder, foo };
return res.format("somePatternString", args); // ARB combines resource lookup and formatting

to

final Object[] argList = {  numMessages, priceOrder, foo };
final ImmutableMap<String,Object> argMap = CompatibilityHelperArrayToMap(argList);

return res.format("somePatternString", argMap);

and that can be further wrapped for convinience:

final Object[] argList = {  numMessages, priceOrder, foo };
return res.formatWithArgList("somePatternString", argList);

with the helper being hidden inside the customer's API and developers need not to worry about it.

In other words, I am pushing back on your claim that this cannot be solved in convenience APIs and we must extend the spec to allow for bad practice in order to avoid blocking adoption on highly disruptive code changes. I claim that we can make it convenient on the right level, for organizations that need the transitional period, and we do not have to change our syntax for that.

As far as I can tell, there is no technical reason to disallow numeric keys.

I think it's a very vague claim. What is a technical reason in case of a localization system? Is disallowing nested selectors in MF2 due to technical reasons? I'd argue that nested selectors enable a whole classes of use cases which flattened selector makes impossible. If we do not have a technical reason to forbid them, should we enable them?

but I could see existing implementations wanting to provide a migration path.

I am aligned with you. We should do our due diligence to ensure minimal disruption and fewest possible papercuts for migrators. I believe this is what motivates you to file this issue! I believe we can solve this without extending the syntax, which you also seem to agree is possible.

I suggest we ensure that the list[value] -> Map<argX, value> is easy and reliable and can be conveniently implemented for all migrations. I also agree with you that we should document the motivation and approach to migration in our spec/documentation.

[0] I recognize you said "the most important" - I assume you do not see it as high value, while you see MF1->MF2 migration DX as high value. I see both as equally high value, and resilience as higher in the long term as I think of M2 as a system whose majority of users and use cases over its lifetime have never used MF1.

aphillips commented 1 year ago

But I don't see the fallback error state as the most important.

I think that's the source of our disagreement. I see resilience as an important part of a dynamic system's design which targets cross-roads of technologists and non-technologists to collaboratively produce human readable output. [0]

You address my concern in your footnote. I didn't say error state was not important. I just don't stack rank it as highly as you appear to. FWIW, I also don't rank MF1->2 migration as high as I think you think I do :-).

I am only saying that I want to allow key names which start with (and potentially only contain) digits.

And I am only saying that I'd prefer to force such key names to use arg[0-9]+ convention.

These are not the same thing. I am talking about the namespace for keys, not about positional arguments at all. It happens that positional arguments might use integer keys (or not), but this isn't about that. This is about allowing keys to start with and even be composed of just ASCII digits (noting that non-ASCII digits are "just fine" with us!!!)

Here's a different example using German orthography subtags:

let $langtag = // "de", "de-1901", "de-1996", "de-DE-1996-u-co-phonebk"
match :locale_get("variant", $langtag)
when 1901 {message with olde fashioned spelling}
when 1996 {message with modern reformed spelling}
when * {another modernly spelled message probably}

In other words, I am pushing back on your claim that this cannot be solved in convenience APIs and we must extend the spec to allow for bad practice in order to avoid blocking adoption on highly disruptive code changes. I claim that we can make it convenient on the right level, for organizations that need the transitional period, and we do not have to change our syntax for that.

I am not claiming this. In fact, my example is exactly a "convenience API" that hides the migration entirely. What I'm arguing for is not an extension in order to allow a bad practice. It is to enable the freest possible use of the syntax--which can include some uses that you (or I) might feel are bad.

As far as I can tell, there is no technical reason to disallow numeric keys.

I think it's a very vague claim. What is a technical reason in case of a localization system? Is disallowing nested selectors in MF2 due to technical reasons? I'd argue that nested selectors enable a whole classes of use cases which flattened selector makes impossible. If we do not have a technical reason to forbid them, should we enable them?

I thought this was a very non-vague claim. What is the technical argument for disallowing digits in the production in question? In what functional way is a localization system harmed by their existence? I agree that positional integer keys are less good than named ones.

The discussion of nested vs. matrix selectors is a very interesting one, but it should be its own issue (if you care to reopen it). It doesn't depend on or, AFAICT, inform the discussion of key values.

but I could see existing implementations wanting to provide a migration path.

I am aligned with you. We should do our due diligence to ensure minimal disruption and fewest possible papercuts for migrators. I believe this is what motivates you to file this issue! I believe we can solve this without extending the syntax, which you also seem to agree is possible.

My motivation is informed by migration experience, but mostly has to do with: I want the fewest arbitrary/preferential decisions in our syntax that people have to learn (and machines have to check).

I suggest we ensure that the list[value] -> Map<argX, value> is easy and reliable and can be conveniently implemented for all migrations.

I actually think this could be out of scope?

I also agree with you that we should document the motivation and approach to migration in our spec/documentation.

As noted, I'm fine with (possibly non-normative) recommendations for how to migrate. Note that MF1 migration or compat appears nowhere in our goals and deliverables.

asmusf commented 1 year ago

I've been playing fly on the wall for these discussions. I would tend to agree that the salient question should be focused on what would break if you allowed fully numerical names. That's the level which is a appropriate for what is effectively a MUST NOT. Simple bad (or non-optimal) practice, if it can be identified, could be the subject of prescriptions or proscriptions expressed with SHOULD or RECOMMENDED.

eemeli commented 1 year ago

I am only saying that I want to allow key names which start with (and potentially only contain) digits.

And I am only saying that I'd prefer to force such key names to use arg[0-9]+ convention.

These are not the same thing. I am talking about the namespace for keys, not about positional arguments at all. It happens that positional arguments might use integer keys (or not), but this isn't about that. This is about allowing keys to start with and even be composed of just ASCII digits (noting that non-ASCII digits are "just fine" with us!!!)

Here's a different example using German orthography subtags:

let $langtag = // "de", "de-1901", "de-1996", "de-DE-1996-u-co-phonebk"
match :locale_get("variant", $langtag)
when 1901 {message with olde fashioned spelling}
when 1996 {message with modern reformed spelling}
when * {another modernly spelled message probably}

In case we're talking of variant keys like 1901 and 1996 in the above, those are currently valid as they need to match the Nmtoken = NameChar+ rule rather than the Name = NameStart NameChar* rule.

Or in case the issue is around the selector, it would be helpful if it used our current syntax, with which I presume that line ought to read something like:

match {$langtag :locale_get key=variant}

The Name rule is currently only applied to variable names, function names, markup names, and option keys.

I also agree with you that we should document the motivation and approach to migration in our spec/documentation.

As noted, I'm fine with (possibly non-normative) recommendations for how to migrate. Note that MF1 migration or compat appears nowhere in our goals and deliverables.

Huh, you're right. We've talked so much about MF1 compatibility that this actually surprises me.

I realise that we may mean different things by "MF1 compatibility" as well. For the record, I would consider MF2 to be compatible with MF1 if it's possible to use an MF2 implementation and some set of runtime functions to provide the same external API as an MF1 implementation provides.

mihnita commented 1 year ago

My reading of "MF1 compatibility" is being able to solve the same problem that MF1 does. Not backward compatible with all the bugs / non-features. But if MF1 does something right, we should support it.

It is not a drop-in replacement, no code changes required. The syntax of the message is expected to change, and the code using it is expected to change.

Maybe a discussion of what "MF1 compatibility" really means would be good. Might help not only here, but also with other issues (for instance matching for selection :-)


On positional parameters, I agree with Zibi. And to not go into holly wars, I will narrow it down to localization, not to all programming languages in general.

For a translator there is a lot more context in "You added {0} to {1}" vs "You added {$fileName} to {$folderName}"

That is a big plus already.

It is also better for leveraging.

"Do you play {$sportName}?" and "Do you play {$musicalnstrument}?" will be translated differently in some languages (in Romanian you "... joci tenis" and "... cΓ’nΘ›i la pian") But with positional arguments they would be both collapsed into "Do you play {0}?", so we throw away important context information.

And in a multi-sentence paragraph is usually split into sentences (segmentation) for better leveraging.

msg1 = This is the first sentence, no parameters. And this is the real thing, with a {0} in it.
msg2 = This is the first sentence, {0} parameters. And this is the real thing, with a {1} in it.

With named parameters the second sentence is leveraged 100%. ("And this is the real thing, with a {$foo} in it." stays the same) With positional, it might not be (depends on how smart the TM is). ("And this is the real thing, with a {0} in it." is different than "And this is the real thing, with a {1} in it.")


In general for API design I tend to go with "make it easy to do the right thing, make it hard (but not impossible) to do the wrong thing"

In this case, if someone really wants positional parameters, they can do something like this:

// pseudo-code
map l2m( iterable args) {
    result = new map
    index = 0
    for each item : args
        map["p" + index++] = item
    return result
}

And now they can do:

mf = MessageFormat("{Accept to pay {p0} for your order.}")
mf.format(l2m(amount))

instead of:

mf = MessageFormat("{Accept to pay {$amount} for your order.}")
mf.format( {'amount' : amount} )

So even if I think that positional arguments are bad, one can still do it, if they have a good use case that I couldn't imagine.

This is a bit what Addison said, with "Yes, the implementer could create a convention like arg0" But left to the developer using the library, not the implementer.

Another of my API rules of thumb is "if a big majority of the users are required to write the same duck-tape code to use a certain feature, then that duck-tape belongs in the library"

So it depends if we consider positional arguments a feature or a misfeature not. I consider them a misfeature. And I don't think that a majority will (or should) use it.

In all the guidelines I've seen / wrote I recommend named arguments over positional ones for localization.

mihnita commented 1 year ago

One extra note: MessageFormat is at times inconvenienced by this need to support both numeric / named parameters. But you can't use both in the same message, it is either / or. And that is a runtime exception (yes, you can lint)

My guess is that MF1 supports positional parameters because that is in the JDK MessageFormat (which is still "the old MF1", yes).

Even the doc has some warnings about it:

Some of these methods (the ones corresponding to the original JDK MessageFormat API) address the top-level arguments in their order of appearance in the pattern string, which is usually not useful because it varies with translations. Newer methods address arguments by argument number (β€œindex”) or name. https://unicode-org.github.io/icu/userguide/format_parse/messages/#custom-format-objects-discouraged

So I think it is a misfeature (same a ChoiceFormat). But unlike ChoiceFormat can't be deprecated "because JDK".

stasm commented 1 year ago

It looks like most of us agree that positional arguments are not a good practice. And, in fairness to @aphillips, this issue is not about reintroducing them. Instead, it's another instance of the discussion about being lenient on input. In fact, after this discussion, I'm warming up to the idea of dropping the name production in favor of nmtoken, and possibly relaxing nmtoken even further.

Adding positional argument support is not something I think we want to do (and am not proposing it), but I could see existing implementations wanting to provide a migration path.

This point from @aphillips convinces me. Yes, $arg0, $p0, and $_0 are viable solutions, but $0 suggests itself (regardless of the underlying implementation; this can still be a map lookup) and offers a direct translation from MF1 to MF2. One doesn't even have to choose between the arg, p, or _ prefixes.

zbraniecki commented 1 year ago

This point from @aphillips convinces me. Yes, $arg0, $p0, and $0 are viable solutions, but $0 suggests itself (regardless of the underlying implementation; this can still be a map lookup) and offers a direct translation from MF1 to MF2. One doesn't even have to choose between the arg, p, or prefixes.

Can you share your position on my response to that, which is:

$0 looks like zero dollars. $arg0 does not. Producing partial error output Press ok to accept payment of {$0} to the seller is much more likely to mislead the user than Press ok to accept payment of {$arg0} to the seller.

stasm commented 1 year ago

I agree with you. We should encourage developers to use descriptive parameter names. At the same time, I'm concerned that for many developers the choice they will face is:

Migrations are hard, require coordination, approvals, are difficult to test at scale and to roll back. The less friction we cause in the first step, the more likely it is that the next steps will happen at all.

stasm commented 1 year ago

Also, nit-picking a bit: did you know that Argentine pesos are often abbreviated as Arg$?

zbraniecki commented 1 year ago

I'm concerned that for many developers the choice they will face is:

I think this is a false dichotomy. As discussed in this thread and (I believe) agreed upon by Addison, Mihai, Eemeli and me at least, we do not face this dichotomy.

We can safely allow for migratory path to translate list of arguments to a map with argX keys and migrate MF1 syntax to MF2 syntax with {$argX}. Here's a comment where I describe the path to a convenience wrapper.

If you disagree that this is possible, or believe that this will not alleviate the friction for migration, I'd appreciate if you stated such position explicitly.

stasm commented 1 year ago

Team A will choose your migration tool, which translates {0} to {$arg0}. Team B will choose my migration tool, which translates {0} to {$_0}. Then someone else will come and suggest to unify by renaming to {$p0}. Then it will turn out that someone has a service in Go, but that the convenience wrapper isn't available in Go yet. In C++ it will work a bit different, and won't support the $p prefix for a few more weeks, because the C++ binary has a 2-week-long release schedule and a 30-day support window. Some service will roll out with the newest version of the convenience wrapper, but due to an outage it will be rolled back overnight to an older version.

zbraniecki commented 1 year ago

I think this is a strawman argument Stas. Solvable by a single sentence in spec recommendation on prescribed way of handling MF1 argument lists in MF2.

stasm commented 1 year ago

But wouldn't the same sentence solve the error scenario problem that you worry about?

I work a lot on features that require changes to both code (binaries) and data (configuration), and I've experienced much of the friction that I'm alluding to first-hand. Granted, there may be fewer pieces involved in the migration path you're proposing, but I would nevertheless expect some friction.


I think we're getting close to making this a discussion about how is ultimately responsible for implementing bad practices. In one of the previous comment @zbraniecki mentioned nested selectors: why forbid them rather than recommend against them and let developers use them when they need them? The same goes for nested function expressions (#353). My position here is that nesting has impact on the data model, runtime, static analysis, tooling, interchange, migrations, CAT GUI... on top of being a sharp tool. Here, however, the proposal is to make a surgical change to the BNF, which slightly relaxes the grammar of variable names, function names, and option names. Nothing else changes, and we can still recommend against using numerical names.

eemeli commented 1 year ago

My preference would be to not allow for a leading digit, for two primary reasons:

  1. It removes ambiguity about the shape of the context with which variable reference are resolved. If a variable reference cannot consist of only digits, that context must provide a mapping of string names to values; it cannot be an indexed list of values.
  2. It leads to a better translator experience and better error-handling, as a lone placeholder like {$arg0} is less ambiguous than {$0}.

We can sidestep the mismatch between tooling by including something like this in the spec:

When transforming messages to MF2 from a syntax that supports positional rather than named arguments, variable references to these positional arguments SHOULD be composed of the string arg followed by the decimal representation of the positional argument's integer value. For example: $arg0, $arg1.

An implementation applying such a transform could then provide a helper as suggested by @zbraniecki above in https://github.com/unicode-org/message-format-wg/issues/350#issuecomment-1432338248 that could be used by legacy code.

That should allow us to remain fully backward-compatible while providing an eventual pathway for the names to be replaced with more descriptive ones, so that translators get more context when working with them.

asmusf commented 1 year ago

Rationale 1. to me reads like: "references are fundamentally a map and not an array", and while you can fake an array with a map, we want to put a bit of friction there, because implementing an array on top of a map ought to be "in your face".

To my view, that rationale wins over the "freedom of expression" argument better than Rationale 2. As long as it's possible to do things like $p0 or $_1, you aren't fully guaranteeing better translator experience.

Given Rationale 1, providing migration advice in this instance is not just the proper approach, but essential. I would consider such a recommendation a "convention" and call it out as such, as in "SHOULD follow the convention of...".

zbraniecki commented 1 year ago

But wouldn't the same sentence solve the error scenario problem that you worry about?

I do not see how a sentence in a spec recommendation can solve the scenario where we produce Press ok to accept payment of {$0} to the seller if we accept 0 as a valid variable identifier.

stasm commented 1 year ago

@zbraniecki The spec can use a SHOULD to recommend against $0. To quote RFC 2119:

there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

Any sort of fallback string which includes the dollar sign can be argued to be potentially confusing. See my earlier comment about the Argentine peso.

aphillips commented 1 year ago

@eemeli I still disagree with your logic about disallowing first-digits. Based on the conversation here, are you in agreement that we should allow starting digits but provide guidance on migration? Or strictly opposed to starting digits?

My comments on your criteria:

  1. It removes ambiguity about the shape of the context with which variable reference are resolved. If a variable reference cannot consist of only digits, that context must provide a mapping of string names to values; it cannot be an indexed list of values.

Variable names are just variable names. My argument here is that many names might be useful to an application and there is no reason to impose this particular restriction on what an application developer might want to do. There are many systems that generate variable names from data or where names might naturally start with numbers--not just "array-like accessors". Here are some examples:

$7pctSolution ; starts with a digit
$1234         ; just a numeric name, perhaps the name of a product attribute?
  1. It leads to a better translator experience and better error-handling, as a lone placeholder like {$arg0} is less ambiguous than {$0}.

No argument about this--we have more in the thread about what to recommend for migration. But this is not a good argument against digits at the start or as full names in the ABNF.

For me, we simply have to accept that users (end users, not implementers) will make decisions about how to use our interfaces, APIs, and tools in ways that we might consider to be bad practice or that we would not users to perpetuate. Nothing we do will guarantee that variable and placeholder names are good for translation any more than (say) programming languages prevent single letter variable names other than for loop control. How are translators helped by $m47y36.kk? By $x?

@stasm Yes, that's what SHOULD means πŸ˜„. I think we SHOULD provide migration advice, but not in the ABNF. The Argentine peso example, while contrived, is exactly the sort of thing I had in mind just above 😸. I will note, though, that any migrated message will have all and only $argX variables, though.

eemeli commented 1 year ago

I don't like starting digits, because they make it too easy to keep using indexed variables like $0 and $1.

Not allowing starting digits for variable names is nearly universal across programming languages, and not following that practice sends a strong signal that indexed names are fine, no matter what SHOULD statements we include.

No matter what rule we use for variables names, it'll undoubtedly still be possible to pick bad ones; that we can't fix here. But we can -- and should! -- add a little bit of friction for implementation and library/tooling developers who wish to support legacy indexed names. Doing so helps make it clear that MF2 is not itself providing universal support for them, but that that's done separately.

Also, regarding my previous spec text recommendation, there appears to be some common practice of using an underscore _ as a prefix for names that can't be used directly as variables. With that, we'd end up with $_1 rather than $arg1.

aphillips commented 1 year ago

Rejected in 2023-04-10 call