krakenjs / spud

A content store parser, reading a java .properties-like format
Other
14 stars 9 forks source link

Investigate 'jsup' #16

Closed aredridel closed 10 years ago

aredridel commented 10 years ago

Via @diegone

Here's an idea I had a few years ago, it was fun to spec out last night. Ready to be crushed! ;) https://github.com/diegone/jsup

aredridel commented 10 years ago

This is a followup from conversation on #14

aredridel commented 10 years ago

What i18n purposes is this good at?

aredridel commented 10 years ago

Adding @diegone

diegone commented 10 years ago

The idea is that JSUP is pretty general-purpose (like JSON) and supports everything we need (like nested object/arrays) and is easy to constrain via special types and annotations, and keep it extensible for the future without changing the syntax and the modeling. It's also easy to yield different views to different use cases (e.g. runtime vs. l10n submission time).

For example, this is what it could look like:

@sourceLanguage('en-US')
@targetLocales(['de-x-DE', 'zh-Hans-x-C2', 'es-x-SP', 'es-x-AR'])
bundle() {
  hi1: 'hello'
  hi2: { translate:false target:[SP, AR] 'hola'}
}
aredridel commented 10 years ago

So this as specified as "XML" for the purposes of internationalization. It's completely generic, and would require an entirely secondary semantic restriction to make a content format.

I don't think this is particularly useful since JSON is already too powerful for the task at hand and requires a second spec for the current use.

Also at this point, you've reinvented much of Javascript, with new syntax, and introduced some new concepts -- like types -- that have no specification.

diegone commented 10 years ago

You need a secondary semantic restriction anyway. For example, the proposed new key{k-v meta}=value simply defines a syntax to capture key-value pairs that is totally generic. You have to define a secondary semantic restriction to specify what keys are meaningful.

diegone commented 10 years ago

Can you explain in what sense JSON is already too powerful? It seems to me that the current properties format extensions (with arrays/objects via scoped keys and []) reach almost the same expressivity.

kkleidal commented 10 years ago

Plus, the properties files support comments. Currently, JSON does not. I know that's not important for data structure, but I'd imagine it's important for people who edit the data.

aredridel commented 10 years ago

Right now, you can have json that sets up numeric values, null, true and false as booleans; You can have an array as the root object, or a string. Adding metadata actually adds more places to restrict further -- are arbitrary objects allowed as metadata? Arrays?

It's a lot of expressive power, and a lot of what you can express is not particularly meaningful.

I'll grant that even the scoped keys and arrays add quite a lot of unused expressive power, but thankfully localized to just the access parts of the display API -- how to get to a value -- and not overloaded with too much meaning.

diegone commented 10 years ago

I fail to see the difference, and especially I fail to see how the properties file format help with these issues.

With the current properties format, you can also specify a=123 which is technically usable, but linguistically undesirable. It also supports stuff like ☃=snowman which will probably break l10n tools. So you'll always end up needing some sort of validation tool to enforce the secondary semantics. The format alone doesn't buy you much.

The way jsup helps, is that you can easily explain the secondary semantics. For example:

To me that's much easier to explain and extend, while at the same time the syntax is almost as concise (in terms of keystrokes) and more familiar (e.g. escaping rules would be what JS does)

aredridel commented 10 years ago

What's a type to the consuming code?

Who defines an annotation?

What does it mean for something to have an annotation?

What's an annotation to the consuming code?

What happens if you @targetLocales(bundle() { heroes: "germs" }) ?

Having to enforce semantic constraints is a weakness -- offset only by being able to use a standardized format for a basis and therefore not have to specify it.

☃=snowman is legit, if keys are defined to accept any unicode codepoint. Any tool that breaks with this is buggy. Having a good spec lets you decide which thing is buggy: the file format spec or the tools that broke on unexpected input.

diegone commented 10 years ago

I was looking that the spud test cases and saw:

address.state.az.key=AZ
address.state.az.value=Arizona
address.state.ca.key=CA
address.state.ca.value=California

What I hear is that developers hate XML (because it's verbose?) but they'd prefer the above over this?

address: {
  state: {
    az: { AZ: Arizona }
    ca: { CA: California }
  }
}
aredridel commented 10 years ago

Yes. XML has outsized hate, not just because of verbosity.

diegone commented 10 years ago

By your argument, for kraken config we should invent a new format because JSON allows too many invalid configurations to be defined.

If that's the new design philosophy going forward, that'd be fine, but I don't think inventing a franken-properties format is in line with the rest. If anything, migrating the rest of json files to jsup would make sense and make easier to implement/document add-ons like shortstop.

aredridel commented 10 years ago

Except that there is no specification for jsup, no processing tools, no defined semantics -- even what a "type" is, and a large grammar.

So far, kraken config's limited power has helped keep configuration relatively simple; it is however being rubbed up against for expressiveness. We're near the sweet spot -- probably just a little under for expressiveness.

My argument is that you have to find the right trade-offs between expressiveness and meaning.

We've agreed that we need some kind of per-string metadata -- translate: false being one of the few concrete examples -- and at least hinted that per-document metadata is useful (translation source and error-recovery fallback, as well as generic inherited definitions)

That's about the extent that we've managed to scope out though. Infinite future extensibility isn't a terribly important concern to me: We have the power to update our software to accommodate an improved file format. The power to defer all discussion of meaning until after a file format is built is, I think, counterproductive: What we really need to capture is what meaning we need to express, and a succinct, not-error-prone way to capture it.

diegone commented 10 years ago

A "type" is just a string, you do whatever you want to it in the reviver.

It's like the element name in xml. In json objects are anonymous and having a way to easily distinguish objects without adding a "type" property is very handy.

aredridel commented 10 years ago

So a type is metadata for an object?

How's that different than an annotation?

diegone commented 10 years ago

Yes, type is part of the meta of an object (see https://github.com/diegone/jsup#logical-model)

It's conceptually equivalent to @name(foo) but it's there for syntactic sugar reasons (to make it look like a constructor function) and support constructor function parameters (for even more sugar). For example instead of hi2: { translate:false 'hola'} you could do hi2: string(false, 'hola') and you should also see where I'm going with this: greeting: dust('Hello {user}') (to solve the lack of a messageformat). You technically could also do { 'hello ' b { 'world' } } to model markup but I'm not sure we want to go that far.

I think the actual proposal of what annotations/types/properties we want should be iterated because we're stepping into developer-friendliness territory, but the concepts and mechanisms to achieve things are there.

aredridel commented 10 years ago

Yow. I think this is an extremely complex, extremely fraught reinvention of much of Javascript.

I think one can safely say that this opens more issues than it would solve.