Design a new source format (object representation and/or on-disk representation)

simoncozens commented 3 years ago

Ideas for hierarchy so far: newsource

alerque commented 3 years ago

Why have masters at all? I understand how they came to be historically and the case for instances (a set of static locations on axes) but if you're designing a new on-disk and object format, why not drop the charade and just go straight for glyphs? Each glyph would have control points that are not just a pair of coordinates but but vectors with enough values to locate them through each axis. Editors could use the instance location data to draw what people currently think of as "masters", but if you're coming up with a new model for holding the data I don't think dragging the current paradigm with you is necessary.

simoncozens commented 3 years ago

Interesting idea! I started to come up with reasons why not, but I'm not sure any of the reasons I have are valid. Going "variable first" has a lot of advantages, too. Hmm.

Also I'd like to hear any thoughts you have from your investigations on what makes a good version-control-aware on-disk format.

schriftgestalt commented 3 years ago

I think, the way you set up the masters (containing glyphs) are not useful. They are just some positions in the designspace that have some info attached (coordinates, vertical metrics …).

What you have as top level "glyphs" might be better called encoding. (Not sure why that should be in a different place than the outlines.)

Glyphs and layers should be its own thing. But have some connection to the masters (to get to the vertical metrics).

The idea of one set of control points with different coordinates for each point in the design space is bad. You need to be able to store incompatible outlines per glyphs. Not all font projects are variable fonts. There are color fonts, layer fonts, unfinished fonts.

One thing I never understood: AFAIK the default for an axis is where the outlines in the glyf table are? Then you NEED a proper master (meaning "master layer" aka a set of outlines/shapes) at that location. And what if you set the default in the axes and later move or remove that master? So in the font storage, you need to point to a master that is supposed to go into the glyf table and it's coordinates will be the default.

florianpircher commented 3 years ago

I concur with @alerque in that masters are no longer needed and the “design space” should move into individual glyphs. Since this puts more information into each glyph, I think a glyph should have its own directory. Personally, I dislike the separation of layer in UFO. After working on a glyph, this is what a version control system shows for UFO:

modified:
L glyphs
  L e.glif
L glyphs.background
  L e.glif
L glyphs.some-layer
  L e.glif

whereas this feels more intuitive to me:

modified:
L glyphs
  L e
    L regular.glyph
    L regular-background.glyph
    L some-layer.glyph

There could also be space for a reserved Info file for general information about the glyph:

.
L glyphs
  L e
    L Info.plist
    L regular.glyph
    L regular-background.glyph
    L some-layer.glyph

simoncozens commented 3 years ago

I would really like to avoid a directory-based system, in favour of a single file, unless there are very strong arguments to do with version control. (And they have to be real arguments - not just that it looks prettier - because git is very good at merging files.)

simoncozens commented 3 years ago

@schriftgestalt What would be the rationale for having layers be their own thing, rather than attached to glyphs (inside a master)?

I take your point about incompatible masters, which would make storing deltas impossible. For example, sometimes I have a “skeleton master” with open paths, and then extrude the paths to get the real masters. That wouldn’t work under a delta-only model.

schriftgestalt commented 3 years ago

As a compromise, it could be one file per glyphs (including all its layers and meta data). But as I found out by adding a file format with a similar structure, you need to store the glyph order separately and that is again some data that can get out of sync.

One reason I added that format was to be able to only write changed glyphs. This is mostly relevant for very big files like CJK. Other than that, I totally prefer single files.

schriftgestalt commented 3 years ago

What would be the rationale for having layers be their own thing, rather than attached to glyphs (inside a master)?

I meant having glyphs and layers as one thing, outside of the masters.

florianpircher commented 3 years ago

A single-file system makes a lot of stuff more complicated in my experience. Want to hide certain aspects of the project from version control? Ignore the relevant file. (Thank again, @schriftgestalt <3)

Also, diffs become more difficult to render. Since CommitGlyphs does not (yet?) support the new .glyphspackage format I was looking into making my own glyphspackage GUI. One of the main problems is the interface for a diff. Because a modified .glyph files might have changes on one layer or on multiple layers it is difficult to tell which layers were added, which were deleted, and which were modified and how to present this to the user. Having layers separated by files offloaded that distinction to the VCS which is already equipped to present delete/add as rename if applicable, or to show adding a layer without marking the entire glyph as modified.

florianpircher commented 3 years ago

One reason I added that format was to be able to only write changed glyphs. This is mostly relevant for very big files like CJK. Other than that, I totally prefer single files.

That is a good point against my argument.

simoncozens commented 3 years ago

Designers should not be looking at raw diffs, so diff rendering is irrelevant. We can write tools to make things look nicer.

alerque commented 3 years ago

I lean towards the more files is better than one approach. As you (Simon) say Git is really good at merging files, but people are not so good at keeping track of what is what, as Florian says it's hard to present that info in a UI, and they are harder to manipulate by hand. Taking the multiple file approach too far though is a disaster. With the concept of masters out of the way and just storing Glyphs, one file per glyph is a sweet spot. Georg is right this introduces the need for some ordering data that can get out of sync, but re-ordering something in a single file format is a nightmare for diff UIs compared to file based layouts.

What the data files looks like will have at least as big an effect on VCS systems as the file layout. Internally Git tracks blobs across files anyway, but understanding the difference between an addition, a removal, and a change is harder or easier based on the syntax. Git happens to be much better at linewise operations with relatively small amounts of data on each line, with some surrounding container syntax that makes blocks of related stuff easy to match even if everything in the block changes, but not so much that it becomes cruft.

alerque commented 3 years ago

Designers should not be looking at raw diffs, so diff rendering is irrelevant. We can write tools to make things look nicer.

Yes and no. True designers shouldn't need to be viewing raw diffs, but how well current diff tooling can sort out the difference between adds/removes/changes and how well a programmer can read a diff is a pretty good proxy for how easy it is going to be to come up with visual tooling and a reliable UX that makes sense of the data.

florianpircher commented 3 years ago

Designers should not be looking at raw diffs, so diff rendering is irrelevant. We can write tools to make things look nicer.

Ideally, yes. In practice I still look at a lot of raw diffs, hb-shape ascii-diagrams, etc.

simoncozens commented 3 years ago

I meant having glyphs and layers as one thing, outside of the masters.

Yeah - but why? Why shouldn’t glyphs (the layer drawing part, not the common glyph metadata part) but attached to masters?

simoncozens commented 3 years ago

Hmmm, here's an idea:

newsource

This would (I think) allow you to have layer fonts / incomplete / non-interpolatable layers, but deltas within the interpolatable layers.

florianpircher commented 3 years ago

Maybe axis should also get a type field, indicating whether it is shared by all glyphs (similar to the current master model) or specific to a glyph/subset of glyphs (similar to Glyphs’s smart components). And maybe there are other type for axis for which it would be useful to have a separate type.

florianpircher commented 3 years ago

And maybe a type field is not the best solution. My concern is that in my projects I have many smart-component axes with the same name (width, ascent, …) but different min/max/default.

simoncozens commented 3 years ago

If the layer had a name, then there could be a set of axes defined for each named layer.

florianpircher commented 3 years ago

Every layer has a name. “Background”, “2021-01-27T23:10:58”, “Condensed”, “BoldDisplay”, “finalfinalfinalv5”. Or are you referring to a specific/format defined name?

schriftgestalt commented 3 years ago

In your new drawing, where to put vertical metrics?

And why not attache the glyphs to the masters? Each master would have a list of glyphs and to get to all relevant layers (to check compatibility or to interpolate) you need to go to each master, ask for its glyph for the glyph you are looking for (note: two times the same word for totally different things)

schriftgestalt commented 3 years ago

I don't understand the axis per named layer. What could work is to have axes on the font level and on the glyph level. How those interact is a bit tricky (but it might be possible to append the glyph axis to the font axis).

simoncozens commented 3 years ago

We could handle vertical metrics the same way a binary font does: one place of metric metadata, with axis-specific variation store.

simoncozens commented 3 years ago

Showerthought: We are treating these questions (on disk representation, object hierarchy) as the same but we should address them separately. We might not want to store masters explicitly but we may still want to address them: font.masters[“Light”].capHeight is still a question we might need to answer - for example when converting to UFO.

schriftgestalt commented 3 years ago

I think we think of slightly different things when we speak about "Master". For me, the place to store axis-specific data is called masters. Storing it as actual variation stored (as in the variable fonts) is impractical. At least if you store deltas and min/max/default values. That would again be too specific to variable fonts and as stated before, not all projects are variable fonts. And there might be no axes at all.

schriftgestalt commented 3 years ago

Finding a good structure should work for both the disk representation and the object model. There might be small differences here and there but the general structure should agree. that would be confusing for the user and would need a lot plumbing to read and write stuff.

madig commented 3 years ago

Regarding single file vs. directories: we found that git and single file are frequently problematic. Think of the amount of diffs generated by changing the color of a row of glyphs or renaming a component and how close these diffs can be to one another, making git consider them one hunk. Now throw in multiple designers changing stuff in different places and merging back and forth and you have a solid stream of merge conflicts on your hands (think Row hammer attack). With the amount of diffs a day of work can result in, designers usually give up and just merge everything. GUI tools would be nice but don't exist (GlyphMerge or whatsitcalled is buggy and unmaintained the last time I looked) and then someone has to maintain them. Partitioning glyphs into files is a blunt instrument, but it helps.

simoncozens commented 3 years ago

The solution to that is not necessarily to use the blunt instrument, but perhaps to think about what aspects of a font are normally changed together, and then arranging the file format to ensure they are in the same place.

simoncozens commented 3 years ago

(I'm not absolutely wedded to a single-file format, but I want to make sure we're fixing problems, not just applying band-aids to them.)

alerque commented 3 years ago

Sometimes (not always, but I think in this case) the blunt instrument also makes a better building block for specialized tooling. If the heavy lifting is done you can work that into a friendly UX easier than if you have to muck around in roots of things.

The solution to that is not necessarily to use the blunt instrument, but perhaps to think about what aspects of a font are normally changed together, and then arranging the file format to ensure they are in the same place.

Yes, sort of. If A and B are changed together frequently, separating them from X and Y is a step in the right direction, but keeping A and B independent is even better. Not having changes to A touch B and making sure they are in different hunks is even better.

simoncozens commented 3 years ago

If each "functional change" was on its own line, we could make our tooling stage a line at a time.

madig commented 3 years ago

think about what aspects of a font are normally changed together

This sounds like a job for telemetry! Georg? :D

BTW, how would your format store https://twitter.com/justvanrossum/status/811481272333778944 and maybe even custom spacing at certain points in the designspace (I think Just had a problem like that but I can't find the issue)?

simoncozens commented 3 years ago

I feel like part of the problem here is that we haven't really defined our aims. Why do we want this thing?

I want it because I want to abstract out the differences between the various source formats in existence (Glyphs/Fontlab/FontForge/.designspace), so that I can write software like Flux that works on "a source font". A neutral interchange between source formats (which is multiple-master aware) is also a handy thing to have. As part of that, I'm also going to want to be able to convert to binary TTF: both variable fonts and "master instances" from incompatible masters.

Whether you like masters or not, all the source formats in existence have them - and designers think in terms of them. @alerque's idea of getting rid of masters and just storing deltas just makes the interchange between source formats a whole lot harder.

If we just want an object hierarchy which is a compatible superset of all the current source formats, it would, I think, look like the diagram at the top of the issue.

alerque commented 3 years ago

BTW, how would your format store twitter.com/justvanrossum/status/811481272333778944

Stop storing values, start storing formulas (functions). The formulas recieve locations on all axis as inputs. They can choose to ignore those inputs if they don't matter. In the case of nonlinear kerning the function that is the kerning value could easily respond to the weight axis with appropriate break points.

schriftgestalt commented 3 years ago

Kerning needs two ways to store values. 1) directly tie it to the masters (not to its location, as that can change). 2) at specific location where there is no master (as in Justs case).

My current idea is to use nested dicts that look like this:

{
    firstKey = {  # classname | glyphName/ID of first/left/right glyph in the pair
        secondKey = {  # classname | glyphName/ID of second/left/right glyph in the pair
            masterId1 = 10;
            masterId2 = 30;
            "(450,45)" = 30;  # some form of axis position
        }
    }
}

alerque commented 3 years ago

Whether you like masters or not, all the source formats in existence have them - and designers think in terms of them. @alerque's idea of getting rid of masters and just storing deltas just makes the interchange between source formats a whole lot harder.

True. My issues with existing source formats are different and my suggestions about ditching masters and using formulas stem from frustrations with source format paradigms that end up limiting both creativity & function.

My comments about version control issues still stand no matter. But if you are hoping for easier interchange and not needing new editor interfaces then we can keep this to just a discussion of structures for existing data.

simoncozens commented 3 years ago

I'm not currently interested in writing a new font editor. :-) I just checked what Runebender is doing, and they're currently UFO based - interesting, because if anyone was going to reinvent the world it was going to be @raphlinus and @eliheuer. So I think master-based storage is going to be the default for the near future.

madig commented 3 years ago

because if anyone was going to reinvent the world it was going to be @raphlinus

Ping him, he's got some interesting ideas :)

simoncozens commented 3 years ago

I've got something working. Have a look at https://github.com/simoncozens/test-nfsf/commits/main/test.nfsf

florianpircher commented 3 years ago

Not a fan of YAML as a foundation. YAML is nice to look at, but not to edit, generate, or parse (many quirks, complex grammar, every YAML library seems to implement its own subset). JSON feels like a more universal, simpler format that is easy to read/write/synthesize in practically every programming language.
codepoint in glyphs.yml should be an array
how do metadata.yml and info.yml relate? or should only one be used in the end?

florianpircher commented 3 years ago

Paths and components are separate fields in a glyph (e.g. in the semicolon). This makes it impossible to interleave paths and components (either paths are on top of components or vice versa). Maybe that creates problems when the shapes are added/subtracted/masked/… when removing overlaps on export.

simoncozens commented 3 years ago

The problem with YAML/JSON/XML or indeed anything off-the-shelf is that we need to have control over how the information is laid out - line by line - if we are going to have any hope of making this VCS-friendly.

So the best we can say is that this is YAML/JSON/XML/whatever compatible from a reader's perspective, but we will need to specify precisely how we want it laid out, generated and parsed. And that's OK! This is our format, so we can specify that.

And if we are specifying that kind of thing, and saying "no, we don't want you to use your off-the-shelf library for writing this format", then it doesn't really matter whether you go for YAML or JSON. YAML is nice to look at? Fine, we'll go with that.

simoncozens commented 3 years ago

code point is an array. (See https://github.com/simoncozens/test-nfsf/blob/main/GlyphsFileFormatv3.nfsf/glyphs.yml )
Yeah, metadata shouldn't there; that was an oversight. Everything is in info now.
Paths and component interleaving - hmmm, good point. I did wonder what the Glyphs3 rationale was for that. I will think about that some more.

schriftgestalt commented 3 years ago

How do you add metadata to nodes when the paths are stored in svg format? Each node needs a full userdata/lib storage.

simoncozens commented 3 years ago

Why?

florianpircher commented 3 years ago

At least in the UFO world the direction is clear that people want to attach custom metadata to more and more kinds of objects. See: https://github.com/unified-font-object/ufo-spec/issues/115

simoncozens commented 3 years ago

People want all kinds of things. If they want to stuff things into _formatspecific{org.unifiedfontobject}{nodes}[0]{lib} I'm not going to stop them. I'm just not going to help them.

simoncozens commented 3 years ago

(Basically I want to specify a neutral format that supports 95% of common use-cases, and has _formatspecific for people to store whatever other mad stuff they need to handle round-tripping.)

florianpircher commented 3 years ago

Having less support for custom metadata in a format designed to be a superset of other source files feels like a step backwards. Attaching metadata closely located to the main data is better for version control, easier to spot for humans (no scrolling down to see if there is metadata or not) and easier to read for software (a function can operate on a subtree of the file and does not need to query its parent whether or not there is metadata for some object).

simoncozens commented 3 years ago

I hear what you're saying, but I don't agree. I feel like you're trying to bring an edge case into the centre of the design. The edge case is possible, which is good enough; let's move on.

simoncozens / babelfont

Design a new source format (object representation and/or on-disk representation) #10