tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.43k stars 93 forks source link

Incorrect casing of standard library values #2105

Open toastal opened 2 days ago

toastal commented 2 days ago

Describe the bug Acronyms & initialism in English use upper case letters to differentiate them from no other kinds of words. Using improper casing loses this syntactic information. I see some errors in the standard library:

Note that that 'Text makes sense as is since it is not an acronym or initialism, but also that all of the official sites for these listed values use all capital letters unlike the incorrect casing seen in Nickel.

To Reproduce Look at source

Expected behavior The standard library casing be fixed to match the English conventions & how the languages officially reference themselves.

Environment

Additional context Add any other context about the problem here.

yannham commented 1 day ago

I think there are many different ways to write those acronyms with reasonable justification for each. I would be inclined to say that in code we shouldn't follow English written conventions for text, as code isn't really text. Which leaves us with:

I think ultimately any of those choices is reasonable, and IMHO it doesn't really matter (as long as you're consistent). I think I slightly prefer the current approach (as opposed to say uppercase acronyms) because the casing doesn't really depend on the meaning of each enum tag, so you can use the same consistent writing for all of the values in the stdlib (say, in the option type [| 'Some a, 'None |], for hash algorithms, for stuff that is not acronyms, etc.), and it's also the same case convention as for types and contracts, so you don't have to learn a new one or think too much about it.

I must say I'm not too inclined to break backward compatibility for this, unless there is a strong motivation (and I'm even less inclined to accept multiple casing, such as both 'Json and 'JSON to maintain backward compatibility). Do you think this has any consequence with respect to discoverability, principle of least surprise, etc.? Has this bitten you in any way, or it's more of it's just doesn't feel right case?

toastal commented 1 day ago

Google also put out this style guide: https://google.github.io/styleguide/go/decisions.html#initialisms

It would break backwards compatibility, but I think this was the wrong decision in the first place. Many ‘functional’ projects use proper acronym/initialism casing even in their ADTs. The problem is that you start to lose that casing information, JSON is just a stand in for “JavaScript Object Notation” & the usage of initialism here make it clear that it means as such (as “json” isn’t a word). Since Nickel doesn’t have casing restrictions I would lean in favor of spelling things are the author intended. I don’t think following the Rust crowd is a great argument when you can set your own terms.

yannham commented 1 day ago

Ah, it's interesting that the style guide differs for Go and JavaScript. I agree it's ultimately all pretty arbitrary conventions, at the end of the day.

However, I will reiterate that Nickel is code and not prose and that we should have a convincing practical motivation for breaking backward compatibility. As language maintainers, I think that "spelling things as the author intended" is, to put it a bit bluntly, the least of our concern. As we favor Nickel users over acronym authors, I think it's even worse: now you have to do additional mental gymnastic to differentiate between 'Text and 'JSON. Sometimes it's also not entirely trivial to know how intended casing for commercial acronyms and brands, which can depend on the fad of marketing. I prefer a purely "algorithmic" casing, that is uniform and consistent.

I also agree that it's not a good argument per so to do "just like Rust" (or even a slightly dangerous irrational bias). In our case though the choice was first and foremost practically motivated: because types and values live in the same namespace, it's better for disambiguation to use entirely different casing conventions, rather than slightly different ones (such as the usual camelCase/CamelCase). It just happened that Rust is a now prominent language that has made this choice as well. Additionally, Rust, OCaml, C++ or other existing languages haven't been created out of thin air and following precedents when you're out of technical criteria to make a decision and just need to make an arbitrary, normative choice helps fulfilling the principle of least surprise.

toastal commented 1 day ago

I mean if it were up to me, I would use 'Jsᴏɴ to have the camel casing and not lose the intialism information about the word, but this is the kind of things that would actually cause “surprise” despite making sense; seeing JSON case changed was a surprise to me, which is why I raised the issue. “Algorithmic” casing loses information, which is why JavaScriptObjectNotation becomes JSON when you drop the lowercase letters to become an initialism. This isn’t a stylistic/branding thing either in the case of all 3, JSON, TOML, & YAML. I think 'Text vs. 'JSON is the perfect example for this specifically since TEXT, unlike JSON, isn’t an acronym or initialism so I can’t say I understand the example.

The OCaml naming conventions are usually snake_cased anyhow & C++ is hardly standardized in naming… where snake casing gets to ignore jsonAPI vs jsonApi arguments with json_api.