mapeditor / tiled

Flexible level editor
https://www.mapeditor.org/
Other
11.3k stars 1.76k forks source link

JSON format improvements #3212

Open bjorn opened 2 years ago

bjorn commented 2 years ago

There are a number of inconsistencies and unconventional approaches in the JSON format, that make understanding and parsing it more difficult than necessary. This issue is about collecting those issues so that we may in the future fix them all at once, since in general we want to avoid breaking compatibility. It's also a place to collect feedback on the proposed changes.

Naming Style

There appear to be two common naming styles in JSON documents, which are camelCase and snake_case. However, Tiled's JSON format uses neither. It instead uses lowercasenames, which was originally chosen based on the same convention used in the TMX format.

The lowercasenames are often not very readable, and it makes it more difficult to rely on code generation tools like QuickType while still generating code that is pleasant to work with. Anybody writing a Tiled JSON parser will need to either use the same names, or rename many of them to more readable variants.

This also affects values for properties like draworder, which can be topdown for example, and that should probably become TopDown.

Suggestion: Switch all properties to camelCase.

Inconsistent Point Storage

Whereas layer objects have offsetx and offsety properties, tileset has a tileoffset property that is an object with x and y properties. I think these should always by stored as objects, with the only exception being if x and y are used without additional prefix, like for chunks.

Suggestion: Store points as objects with x and y properties, except when embedded without prefix. Alternatively, we can consider [x, y], which would be even shorter but might come at the expense of being less intuitive to parse.

Inconsistent Color Values

Colors are sometimes stored as "#AARRGGBB" and sometimes as just "#RRGGBB".

Suggestion: Consistently store colors as "#AARRGGBB" (or "0xAARRGGBB", which is likely more convenient to parse).

Inconsistent Inclusion of Defaults

Sometimes a property is left out when its value is the default, in other places it is always written. I think we need to decide on one or the other. Always writing a property has the advantage that the parser does not need to be aware of the default value. As such, maybe we should only leave out properties when their default can be trivially derived from their type (0 for numbers, false for booleans, "" for strings).

An additional problem however, is that the presence of some properties has additional meaning. For example, for template instances, whether or not they explicitly specify a value for properties like visible also tells Tiled whether the original value was overridden. Maybe it is better to explicitly store the names of overridden properties?

Suggestion: Only leave out properties when their default can be trivially derived from their type. Use a separate property to store information about overrides.

Inconvenient Defining of Object Types

In Tiled we support placing rectangles, ellipses, tiles, polygons, polylines, points and text as objects. In the JSON format, there is no single property that can be used to distinguish these different objects. Instead, we have a different property for each of them, and nothing for rectangles, since objects are rectangles by default:

Unfortunately the type property is already taken since every object can have a user-defined string as its type. In C++, the various object types are distinguished by a Shape enumeration (currently with the exception of tile objects).

We could add a shape or objectType property to replace the ellipse and point properties, and allow renaming both polygon and polyline to just points. Or, to change the definition of an object such that only common properties like x, y, name, etc. are stored on the base object, and the differences move to child objects, as follows:

{
    "x": 123,
    "y": 123,
    "name": "Bob",
    "type": "NPC",
    "shape": {
        "type": "Point"
    }
}

{
    "shape": {
        "type": "Ellipse",
        "width": 10,
        "height": 10
    }
}

{
    "shape": {
        "type": "Polygon",
        "points": [ { "x": 0, "y": 0 }, { "x": 10, "y": 0 }, { "x": 0, "y": 10 } ],
    }
}

Suggestion: Introduce an object property that stores the properties relevant to each specific object type.

Layer Types

A similar approach as suggested above for objects could be used to split off the properties of the various layer types to a child property. The layers do currently have a type property, but if we change objects as above then I think we should do the same for layers.

Map Orientations

Some of the supported orientations, like staggered and hexagonal, require additional parameters. Currently these properties are stored directly on the map object, but I think the orientation property should be an object instead, and maybe it should be renamed to grid:

"grid": {
    "type": "Staggered",
    "staggerAxis": "X",
    "staggerIndex": "Odd"
}

Possibly the tilewidth and tileheight properties should be moved into that object as well (and maybe be renamed to spacingX and spacingY, because it is actually independent of the tile size, and this grid definition could be used for further custom grids and unified with the grid property on tilesets).

Embedded Tileset vs. Referenced Tileset

Currently a tileset object can be either a tileset reference, or an embedded tileset. I think it could be better to require them to be always tileset references, and use a separate property for storing embedded tilesets (if we still want to provide that option by the time we do these changes).

Tileset Image vs. Image Collection

These two are different enough that it's a bit awkward to fit them into the same object types. A tileset has a reference to an image, but if it's unset it is assumed to be a collection of images (in which case the rest of the parameters like spacing and margin are also ignored). And each tile also has a reference to an image, which is unset when it is used as part of a tileset. I think we should find a way to support both cases more elegantly.

A tileset could define a list of images, and for each image list the tiles on that image and the rectangle they occupy on that image. This also gets rid of the need to calculate this information based on the tileset parameters (which would become editor-only data, but might need to be stored on each image):

{
    "name": "Some Tileset",
    "images": [
        {
            "source": "<image-file>",
            "tiles": [
                {
                    "id": 1,
                    "x": 0,
                    "y": 0,
                    "width": 32,
                    "height": 32,
                    "properties": {}
                }
            ]
        }
    ]
}

Of course, such a change also implies changes in the editor UI and other supported formats. It is partly covered by issue #2863.

Use of Common Abbreviations

Just a note here, I would consider using common abbreviations, when done consistently, to shorten some of the property names:

Unified Hierarchy

This item may not really belong here, since it's a change that would affect Tiled's internals and all other formats as well. But in general, I don't like that we have a hierarchy only for layers. Objects should be able to form a hierarchy as well. And rather than introducing another hierarchy, I would suggest in this case to unify layers and objects, such that tile layers and image layers are just special kinds of objects (and this will automatically obsolete "object layers" and "group layers").

All the previous changes combined might already be such a big breakage in compatibility that it would be worth to think about making this change at the same time. Related issue is #572.

eishiya commented 2 years ago

Opinions on the proposed changes:

Naming style to CamelCase: Yes

Point storage to {x: val, y: val} all around: Yes

Colour values: Yes to 0xAARRGGBB, No to #AARRGGBB, because some built-in parsers of #colorvalues expect RRGGBB or RRGGBBAA and it's an easy way to introduce subtle mistakes into a parser. Building numbers out of a number will usually require more work, but it'll require the programmer to pay attention, and unlike with strings, when you just want the RGB values, it's the same code to get those values, whereas with strings, it would require #RRGGBBAA to get that kind of consistency.

Inconsistent inclusion of defaults: Yes to excluding trivial defaults, however, I would still like to have options to exclude properties with values matching Tiled defaults. I would rather put more work into my parser to include those defaults, for the benefit of smaller files. A LOT of properties that would otherwise be included in map files are irrelevant to my game and I don't want them in there.

Move shape-specific properties into a shape object: Yes.

Layer Types: Not sure about sub-elements containing layer type-specific data. I see the consistency argument here, but I think in practice, it'll just feel like an arbitrary complication. Looking for different properties based on the value of some consistent property (such as "type") is an expected part of parsing JSON.

Map Orientation: I see no problem with keeping the properties as-is, but I'm not against a "grid" object. Big No to "spacingX" and "spacingY" since tile spacing in tilesets has a similar name. Maybe cellWidth and cellHeight instead? I don't know how widespread this terminology is, but I've been referring to map cells when I needed to distinguish map tile size from tileset tile size for a long time and no one seems to have been confused by that.

Embedded Tileset vs. Referenced Tileset: I don't see the benefit to this, and I think it'll just add unnecessary lists of data to differentiate between embedded and external tileset, and it'll still come down to parsing something differently based on some property value in the end. Leave embedded tilesets as is, IMO. Or, if you want to make the parsing difference be based on the type of a property, consider:

"tilesets": [
   {"firstgid": 1, "source": "externalTileset.tsj"},
   {"firstgid": 49, "source":
      {tileset object}
   }
]

Keep supporting embedded tilesets, it's a useful feature for some games where tilesets aren't often shared, or where there's only one or two maps for the whole game.

Tileset Image vs. Image Collection: As I've previously mentioned, I want to see support maintained for tilesets that are just an image and where tile IDs trivially correspond to locations, since this has storage/memory benefits, and reflects how many existing engines deal with tilesets natively. Storing the position data with each tile is a redundant waste of space in this case. I think the proposed format is good for image collections and spritesheet-style tilesets, though. Perhaps images can have a "defaultTileWidth" and "defaultTileHeight" property (such as might be used when first populating a spritesheet-style tileset), and for standard sheets, the x, y, width, height properties can be left out, and can be calculated from those defaults. This would mean that margin and spacing would need to be reintroduced as properties, however.

Use of Common Abbreviations: Yes to src, no to img for images because img is usually singular and imgs only saves 2 characters, yes to w and h, no to props because I just read it as the word "props" (a common word in gamedev) rather than an abbreviation for "properties".

Unified Hierarchy: Not sure about this one. I like it in theory, but I'd still probably use them exactly like layers and groups are used now, and would want a way to restrict the groupings to work as they do currently. I also like how the current layers match how layers work in image editors, with explicit groups and a clear distinction between different kinds of layers.

aeb-dev commented 2 years ago

Hi, I have been developing tmx_parser for a while for dart-lang. Here are my two cents

Topics that I think should also be taught:

Model differences between tmx and tmj

Currently the xml and json formats are not the same. For example, in xml, TileLayer has Data as child node but in json, Data and its all properties belongs to Layer. Image of TileSet has the same problem, there could be more

Comply with json schema

If a double value is serialized to json it should preserve the dot notation. For example opacity set to 1.0 should be seralized as opacity: 1.0 not opacity: 1.

Comments:

Naming Style

Pick one and stick to it, however I believe there are more important issues than this.

Inconsistent Point Storage

Yes for the suggestion.

Inconsistent Color Values

Introduce color type with separate r,g,b,a fields. Then people can create their value however they want

Inconsistent Inclusion of Defaults

I think it is better to put the value, always. Only left it empty if empty means something. This would make writing parser a lot easier since you would not have to worry about checking whether the value exists or not.

Inconvenient Defining of Object Types

Yes for the suggestion

Layer Types

Yes for the suggestion. I should also note that current format introduces a big problem for forward-only parsers since the type does not come as first property.

Tileset Image vs. Image Collection

I agree with eishiya's comment

Use of Common Abbreviations

I do not think it is a deal breaker. It can shrink the size of the output but other than that I do not see any benefit

Unified Hierarchy

I think this is a whole new level think that should be discussed separately.