projectfluent / fluent

Fluent — planning, spec and documentation
https://projectfluent.org
Apache License 2.0
1.39k stars 45 forks source link

Document valid identifiers for patterns #247

Open gijsk opened 5 years ago

gijsk commented 5 years ago

Apparently 1-foo is not a valid identifier, but foo-1 is. I have no idea why, or what the restrictions are, because somehow https://projectfluent.org/fluent/guide/hello.html and https://projectfluent.org/fluent/guide/text.html do not mention what the syntax is for an identifier, and skip straight to patterns, which are always the values associated with those identifiers...

flodolo commented 5 years ago

I'm also surprised that identifiers cannot start with a number (I don't think I've ever seen it mentioned in docs?) https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf#L86

zbraniecki commented 5 years ago

The reason is of course math, and the fact that we handle numbers. The answer is in https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf but we can probably document is more in the guide

flodolo commented 5 years ago

The reason is of course math, and the fact that we handle numbers.

Sorry but that's not an explanation. Is there a practical reason to forbid identifiers from starting with a number? Also, what does it mean "we handle numbers"? Examples would go a long way.

zbraniecki commented 5 years ago

Yes, math. Math is the reason, if you want to support math, you have to support substraction operator which in most cases, including Fluent is going to be "-". Once you're there, any a-z will be an identifier start and "[0-9]-[a-z]" will be a substraction operation between a number and an identifier. We could, technically, try to forbid that and limit ourselves, but I doubt it is the right tradeoff.

flodolo commented 5 years ago

if you want to support math, you have to support substraction operator

Where and how we support math in Fluent?

I find it concerning that I'm totally unable to follow your reasoning, and yet I'm not exactly new to Fluent (at least regarding the documentation part, and writing FTL).

zbraniecki commented 5 years ago

Umm, I'm not as concerned as you. We can reason and explain our positions and I don't find it concerning.

Math operations and operators are frequently made available for the purpose I provided.

If there's a strong reason for number-initiated identifiers we could, as far as I'm aware, made them possible, but that would go against every programming language and DSL that I'm aware of, which forbid that model for exactly the reason I provided. We don't support math operations in the current syntax but we did in the past and it's reasonable to protect the syntax assuming we may want to support it in the future.

Do you have any strong reason to bring number-initiated-initiated identifiers?

flodolo commented 5 years ago

We don't support math operations in the current syntax but we did in the past and it's reasonable to protect the syntax assuming we may want to support it in the future.

As I said, please provide an example, otherwise we're not going anywhere. I can't imagine how math operators fit into Fluent, and I need an example to understand it.

As for "other programming languages don't", I'm not sure Fluent is a programming language, but that could be a reason good enough, as long as it's conscious and documented.

zbraniecki commented 5 years ago

key = { SELECTOR(1-key2) - > *[if] Foo [else] faa }

The reason is not just compliance. It's to protect our ability to extend syntax by adding likely operations. We may never need it, but I'd assume one day we will. And I don't think we've seen a reason to give up that protection.

I agree documenting it better should happen :)

gijsk commented 5 years ago

Do you have any strong reason to bring number-initiated-initiated identifiers?

Compatibility with .properties (and maybe .dtds, I haven't tried).

gijsk commented 5 years ago

Anyway, the documentation here is just non-existent in the fluent guide. It doesn't say what is and isn't supported (doesn't even define an identifier, as far as I can tell, and even https://projectfluent.org/fluent/guide/references.html only gives examples of uses of placeables but doesn't actually specify what things are and aren't allowed (so it's not obvious that I can use { foo.label }, for instance)).

Note also that the set of supported characters based on that ebnf is much smaller than almost every programming language I know (JS, python (>=3), perl, lisp, c++, rust (feature gated on recent versions) all support unicode characters for identifiers/variables, as well as (in some cases) other ascii characters (notably "$"). Prolog doesn't support _ as a variable name because it's special, so its set of allowed variable names is clearly not a superset of the Fluent one... but that's the exception that proves the rule, as it were.

Of course the current arbitrary restriction is the subject of #117, but the fact that it's not documented should be addressed irrespective of all the other discussion.

gijsk commented 5 years ago

Prolog doesn't support _ as a variable name because it's special, so its set of allowed variable names is clearly not a superset of the Fluent one... but that's the exception that proves the rule, as it were.

Oh, it turns out even this is wrong because Fluent doesn't support identifiers that start with _ either, it seems.

Pike commented 5 years ago

A few thoughts:

Yes, documentation is lacking, and the biggest item of work we have planned for Fluent 1.0. The target of 1.0 will be tooling developers, so even then, usage docs may or may not be in scope. For Mozilla, an overhaul of firefox-source-doc is in order, IMHO, the l10n/intl/l10n docs are saying somewhat conflicting and incomplete things on https://firefox-source-docs.mozilla.org/tools/compare-locales/index.html vs https://firefox-source-docs.mozilla.org/intl/localization.html. They both pretty clearly showing their roots.

Also, yes the restrictions on Identifiers are semi-random. They're based on the idea that we can extend the namespace of identifiers if needed. But also on the idea that we might add things like math.

Why is 1234 not an identifier? Because message references and number literals.

1234 = Hello, World
msg = {1234}

msg is actually 1234 because that's a number literal, https://github.com/projectfluent/fluent/blob/88587ebbb82b905a9fd577aeba3778072cea2b87/spec/fluent.ebnf#L53-L56.

So, the most wide-cast net for Identifiers would require that there's at least one non-number character in there. And that . and - as initial characters are reserved for Attributes and Terms. Otherwise we'd have messages that can be referenced with message references and ones that can't.

Just enforcing a leading char is a one way to make it easy to spot what's an identifer and what's not (if it's documented, sic ;-) ).

zbraniecki commented 5 years ago

One other item to consider is that identifiers in fluent have somewhat special role compared to any other DSL - they play a role in error recovery (as the last resort). I'm not super excited about how western-centric the selection of characters is (Latin alphabet), but I do like the idea that the identifier is meant to be meaningful for the reader assuming they are able to recognize the word in the Latin alphabet. For that reason "cancel-button" is an encouraged identifier while "1-foo" is not. From that perspective, limiting the scope of allowed patterns in an identifier and enforcing a character (so, disallowing identifiers such as ___ or 2_-_-_-1 is helping the recovery strategy.

Of course such efforts are not complete and one could use the argumentum ad extremum that Latin characters can be used to create meaningless identifiers. The only thing I'm saying is that extending the scope of allowed characters is unlikely to help with that goal whole it will chip into our ability to add features to the language later.

So, from my perspective the arguments for extending the scope of characters are - compatibility with .properties and removal of a potential paper cut. Arguments against are maximizing future language extnesion options and nudging the culture of simple, recognizable, consistent and meaningful identifiers. It's a subjective balance of course and I can see how everyone can see it being in a different place.