linebender / norad

Rust crate for working with Unified Font Object files
Apache License 2.0
44 stars 12 forks source link

Saving designspaces #309

Closed RickyDaMa closed 12 months ago

RickyDaMa commented 1 year ago

Adds DesignspaceDocument::save, as well as a lot of serde magic to make it happen

Main things I'm looking for feedback on is if I'm using quick_xml in a sane way (having never used it before, and it didn't come across at all intuitively), and if there are any Option or Vec fields I don't want to skip, even if they're None/empty

Also still working on getting the loading to work, hence the new test is failing. Oh and I need to fix conflicts with #307's changes Edit: all sorted

If this could be released once merged that'd be mighty useful for me (maybe as part of #308 👀)

RickyDaMa commented 1 year ago

Also of note is that there are some fields which norad doesn't deserialise and thus are lost in the serialisation process, e.g. <kerning> & <lib>

I'm not sure if the serialised field/attribute orders match what is typical/conventional. It's not particularly important for my use case

As a comparison, I serialised wght.designspace, which ends up looking like this:

<?xml version='1.0' encoding='UTF-8'?>
<designspace format="4.1">
  <axes>
    <axis name="Weight" tag="wght" default="400" hidden="false" minimum="400" maximum="700"/>
  </axes>
  <sources>
    <source familyname="Test Family" stylename="Regular" name="Test Family Regular" filename="TestFamily-Regular.ufo">
      <location>
        <dimension name="Weight" xvalue="400"/>
      </location>
    </source>
    <source familyname="Test Family" stylename="Bold" name="Test Family Bold" filename="TestFamily-Bold.ufo">
      <location>
        <dimension name="Weight" xvalue="700"/>
      </location>
    </source>
  </sources>
  <instances>
    <instance familyname="Test Family" stylename="Regular" name="Test Family Regular" filename="instance_ufos/TestFamily-Regular.ufo" stylemapfamilyname="Test Family" stylemapstylename="regular">
      <location>
        <dimension name="Weight" xvalue="400"/>
      </location>
    </instance>
    <instance familyname="Test Family" stylename="Bold" name="Test Family Bold" filename="instance_ufos/TestFamily-Bold.ufo" stylemapfamilyname="Test Family" stylemapstylename="bold">
      <location>
        <dimension name="Weight" xvalue="700"/>
      </location>
    </instance>
  </instances>
</designspace>
madig commented 1 year ago

Lib is actually important. You can add a new field, it should be the same type as a Font.lib.

Hoolean commented 1 year ago

Hello!

Is there a way for us to dodge writing automatically-generated source and instance names? This would avoid serialisation being non-deterministic based on when in the process the designspace was loaded (e.g. currently loading the same designspace twice in succession and serialising has different output).

I suppose this might not be possible without giving the unnamed_source_ prefix a special semantic meaning, or tweaking the approach of #304 and #307, although the latter would have benefits in other places too: e.g. fixing PartialEq for the same designspace loaded at different times in the process :)

@cmyr for #304, did we home the default name implementation upstream in norad instead of in the compiler to avoid the breaking API change from String to Option<String>? 🔎

RickyDaMa commented 1 year ago

@Hoolean for now I've pushed a skip_serializing_if option for the generated source & instance names. Personally I think it makes sense to leave the fields as String if the fields are meant to be mandatory

I'm working on (de)serializing lib keys in the designspace but it's being a bit of a pain. Shall try and get it working though

cmyr commented 1 year ago

Hello!

Is there a way for us to dodge writing automatically-generated source and instance names? This would avoid serialisation being non-deterministic based on when in the process the designspace was loaded (e.g. currently loading the same designspace twice in succession and serialising has different output).

What does the python impl do in this case?

I suppose this might not be possible without giving the unnamed_source_ prefix a special semantic meaning, or tweaking the approach of #304 and #307, although the latter would have benefits in other places too: e.g. fixing PartialEq for the same designspace loaded at different times in the process :)

I'm wondering if we even need to add these temp names at all? Maybe we can just let these things be unnamed? the issue in the compiler, as I understand it, is that we want something we can use to uniquely identify sources/instances, but as mentioned below this doesn't need to be norad's job.

@cmyr for #304, did we home the default name implementation upstream in norad instead of in the compiler to avoid the breaking API change from String to Option<String>? 🔎

tbh I don't think I really thought about it too much? I'm not super worried about breaking API changes, where they are necessary, and I do think norad should be as dumb as possible, so maybe we want to back out #304 & #307?

Hoolean commented 1 year ago

What does the python impl do in this case?

From what I can see designspaceLib adds the temp_master prefixed name on load, and then uses that same prefix to see which names should be dropped on save.

I'm wondering if we even need to add these temp names at all? Maybe we can just let these things be unnamed? the issue in the compiler, as I understand it, is that we want something we can use to uniquely identify sources/instances, but as mentioned below this doesn't need to be norad's job. ... I'm not super worried about breaking API changes, where they are necessary, and I do think norad should be as dumb as possible, so maybe we want to back out #304 & #307?

It definitely seems like the root cause is designspaceLib promising different things in the designspace schema (optional source name) and the API it exposes at runtime (guaranteed source name), although even then the latter is inconsistent as if you instantiate and add sources at runtime then they are not given automatically-generated names.

Overall it seems like this might be a coupling that arose between fontTools and the Python compiler stack, and so from a spiritual mindset of wanting us to carry as little 'magic' as possible from the Python world into the Rust world then I also wonder if we can drop it :grin:

madig commented 1 year ago

I'm in favor of making names an Option<String> and changing the spec to reflect real politics.

cmyr commented 1 year ago

Cool, I am always in favour of less code / less logic / being as dumb as possible. Someone want to PR these changes?

RickyDaMa commented 12 months ago

Proof of concept serialisation pushed! Needs code tidying & thorough testing, but I'm fairly sure it's correct!

Turns something like this (Rust debug representation):

{
    "enabled": Boolean(
        true,
    ),
    "disabled": Boolean(
        false,
    ),
    "name": String(
        "John Cena",
    ),
    "age": Integer(
        46,
    ),
    "today": Date(
        2023-07-31T15:59:41.8513477Z,
    ),
    "pi": Real(
        3.141592653589793,
    ),
    "wisdom": Data(
        [
            1,
            2,
            3,
        ],
    ),
    "listicle": Array(
        [
            Boolean(
                true,
            ),
            Boolean(
                false,
            ),
            String(
                "John Cena",
            ),
            Integer(
                46,
            ),
            Date(
                2023-07-31T15:59:41.8513477Z,
            ),
            Real(
                3.141592653589793,
            ),
            Data(
                [
                    1,
                    2,
                    3,
                ],
            ),
        ],
    ),
    "recurse": Dictionary(
        {
            "enabled": Boolean(
                true,
            ),
            "disabled": Boolean(
                false,
            ),
            "name": String(
                "John Cena",
            ),
            "age": Integer(
                46,
            ),
            "today": Date(
                2023-07-31T15:59:41.8513477Z,
            ),
            "pi": Real(
                3.141592653589793,
            ),
            "wisdom": Data(
                [
                    1,
                    2,
                    3,
                ],
            ),
            "listicle": Array(
                [
                    Boolean(
                        true,
                    ),
                    Boolean(
                        false,
                    ),
                    String(
                        "John Cena",
                    ),
                    Integer(
                        46,
                    ),
                    Date(
                        2023-07-31T15:59:41.8513477Z,
                    ),
                    Real(
                        3.141592653589793,
                    ),
                    Data(
                        [
                            1,
                            2,
                            3,
                        ],
                    ),
                ],
            ),
        },
    ),
}

into this:

<TestMe>
  <lib>
    <dict>
      <key>enabled</key>
      <true/>
      <key>disabled</key>
      <false/>
      <key>name</key>
      <string>John Cena</string>
      <key>age</key>
      <integer>46</integer>
      <key>today</key>
      <date>2023-07-31T15:59:41.8513477Z</date>
      <key>pi</key>
      <real>3.141592653589793</real>
      <key>wisdom</key>
      <data>AQID</data>
      <key>listicle</key>
      <array>
        <true/>
        <false/>
        <string>John Cena</string>
        <integer>46</integer>
        <date>2023-07-31T15:59:41.8513477Z</date>
        <real>3.141592653589793</real>
        <data>AQID</data>
      </array>
      <key>recurse</key>
      <dict>
        <key>enabled</key>
        <true/>
        <key>disabled</key>
        <false/>
        <key>name</key>
        <string>John Cena</string>
        <key>age</key>
        <integer>46</integer>
        <key>today</key>
        <date>2023-07-31T15:59:41.8513477Z</date>
        <key>pi</key>
        <real>3.141592653589793</real>
        <key>wisdom</key>
        <data>AQID</data>
        <key>listicle</key>
        <array>
          <true/>
          <false/>
          <string>John Cena</string>
          <integer>46</integer>
          <date>2023-07-31T15:59:41.8513477Z</date>
          <real>3.141592653589793</real>
          <data>AQID</data>
        </array>
      </dict>
    </dict>
  </lib>
</TestMe>
RickyDaMa commented 12 months ago

I'm thinking snapshot testing would probably be most appropriate for serialisation testing, along with round tripping. I'd need to pull in something new for snapshot testing, I was thinking perhaps expect-test or insta

Property testing or fuzzing would be ideal but is complex to set up on custom structures

Thoughts @cmyr @madig?

madig commented 12 months ago

I'm ok with insta et al. You could do some light fuzzing by roundtripping all Designspace files on your puter.

RickyDaMa commented 12 months ago

Quick & dirty round tripper on all my local designspaces worked absolutely fine, the before and after DesignSpaceDocument were equal

cmyr commented 12 months ago

I think snapshot testing infrastructure might be overkill? I would be happy with a single test case that covers all field types, and includes multi-item and nested arrays/dictionaries. If we can load this from file, and then write it to string and have them match, that's great. Once we have it working our serialization code should never really change, and it shouldn't be influenced by anything outside of our control (the cases where snapshots are particularly helpful)

RickyDaMa commented 12 months ago

I think snapshot testing infrastructure might be overkill? I would be happy with a single test case that covers all field types, and includes multi-item and nested arrays/dictionaries.

expect-test is fairly lightweight (which is the main reason why I chose it over insta) and pretty unintrusive. It can keep the snapshots in the source code as well, if you'd (I chose files for neatness) prefer that. a_bit_of_everything does have everything you mentioned in it, so if you would like I can remove expect-test altogether

RickyDaMa commented 12 months ago

ultimately the proof is in the roundtrip

From what I saw testdata/wght.designspace is the most complex DS in the repo, so that's what I chose for load_save_round_trip

If we wanted a heavier-weight approach, I could pull in a dependency to allow me to iterate over all the .designspace files in testdata/, and round trip each of them

Third option is trusting the one unit test and the anecdotal evidence of me saying I wrote a quick & dirty round tripper and found no issues with it on all my locally-cloned font projects (56 designspace files)

Note however that the round trip does not guarantee identical formatting in the loaded vs the saved file, which might be an annoyance for some users. However it won't bother me for my use case and I'm not about to write a formatter/normaliser on top of a serialiser 😂