googlefonts / ufo2ft

A bridge from UFOs to FontTools objects (and therefore, OTFs and TTFs).
MIT License
151 stars 43 forks source link

Populate modified in head based on created #737

Closed IvanUkhov closed 1 year ago

IvanUkhov commented 1 year ago

In CI/CD pipelines, it is useful to be able to reliably produce the exact same binaries at different points in time. This pull request takes a step in that direction by proposing to treat modified similarly to created. To elaborate, an imaginary openTypeHeadModified is introduced, and it defaults to a fixed value when a certain environment value is set.

anthrotype commented 1 year ago

let's step back a little. I don't like we make up a fontinfo field which does not exist openTypeHeadModified, or that we resort to a new environment variable UFO2FT_HEAD_MODIFIED. What are you trying to do that SOURCE_DATE_EPOCH does not already accomplish?

anthrotype commented 1 year ago

fonttools already uses the SOURCE_DATE_EPOCH to set the head.modified upon compiling (TTFont.save) the font, see:

https://github.com/fonttools/fonttools/blob/1c283756a5e39d69459eea80ed12792adc4922dd/Lib/fontTools/ttLib/tables/_h_e_a_d.py#L87-L88

https://github.com/fonttools/fonttools/blob/1c283756a5e39d69459eea80ed12792adc4922dd/Lib/fontTools/misc/timeTools.py#L72-L77

TTFont.recalcTimestamp is True by default in the constructor: https://github.com/fonttools/fonttools/blob/1c283756a5e39d69459eea80ed12792adc4922dd/Lib/fontTools/ttLib/ttFont.py#L103

IvanUkhov commented 1 year ago

The use case is as follows. We use fontmake for compiling sources into binaries. Occasionally, we recompile families. It can be due to a new version of fontmake or some other tool, and it can also be a way to perform a sanity check. When that happens, we always get new binaries, even though the corresponding source is untouched. We then have to have tools for reading critical values from binaries and comparing them. Furthermore, the size of the repository increases unnecessarily when we check in practically the same binary multiple times.

Thank you for the references. The difference here is that we do not open and resave binaries but recompiles them from sources. So what fontTools does is not applicable in this context. Please correct me if I am wrong.

I was planning to reused SOURCE_DATE_EPOCH for setting modified upon compilation, but then one would have to keep track of the value outside of the source for each family, which seems cumbersome. Setting it to the same value for all families also seems inappropriate. The idea was then to make it relative to created, which is already in the source.

If we do not want to introduce new environment variables, which is understandable, one can reuse SOURCE_DATE_EPOCH but slightly differently: whenever it is set, we set modified to created, regardless of whether created is using SOURCE_DATE_EPOCH. I also agree that openTypeHeadModified should be removed as a concept.

Do you perhaps have better ideas?

anthrotype commented 1 year ago

that's exactly the use case for SOURCE_DATE_EPOCH and simply setting that one should already accomplish what you ask. Have you actally tried?

we do not open and resave binaries but recompiles them from sources.

that's ok, the head.modified field will be updated by fonttools as soon as you do font.save on the font that ufo2ft returns. Is this not happening?

anthrotype commented 1 year ago

whenever it is set, we set modified to created, regardless of whether created is using SOURCE_DATE_EPOCH

that also sounds reasonable and we can make that change. But as I said, it is not strictly needed because fonttools already takes care of setting head.modified according to SOURCE_DATE_EPOCH when saving. It is only in the intermediate state after ufo2ft has completed and returned an instance of TTFont, and the state after that instance has been compiled (i.e. saved to disk or to a BytesIO stream) after which the modified timestamp is updated based on SOURCE_DATE_EPOCH because recalcTimestamp=True by default, that the head.modified != head.created.

IvanUkhov commented 1 year ago

I see. Sorry. Yes, fontTools is relevant, as ufo2ft invokes it at the end. Then we could indeed set SOURCE_DATE_EPOCH upon compilation to have modified populated with a fixed value. The complication is that, again, we need to keep track of what we set it to externally. For each family, we would have to have some metadata file from which we would read the value to use during compilation, which is onerous and error-prone. This value has to make sense together with created: modified should not be smaller. We could also have one single value set to the year of 3000 for all families, but that is undesirable, as you can tell. Or would you propose we read created from the source and supply it via SOURCE_DATE_EPOCH upon compilation?

that also sounds reasonable and we can make that change. But as I said, it is not strictly needed

I want to make sure we are on the same page. We both should be of the same opinion here.

anthrotype commented 1 year ago

we could indeed set SOURCE_DATE_EPOCH upon compilation to have modified populated with a fixed value

yes, that's what one does when they wish to have reproducible builds. Are you saying you do not want to even set SOURCE_DATE_EPOCH at all, but still have modified be qual to whatever created is?

anthrotype commented 1 year ago

maybe what we could do is to not rely on fonttools updating the head.modified field upon compilation, but instead set that ourselves directly inside ufo2ft to the desired final value. To do that we need to initialize our TTFont instance with recalcTimestamp=False to avoid fonttools touching it upon saving the font. The outlineCompiler creates the new TTFont here: https://github.com/googlefonts/ufo2ft/blob/0c0a570b84d1351ab704ba1fa5ae03aeef51179f/Lib/ufo2ft/outlineCompiler.py#L138

Then we would set head.modified to either the current time by default, when SOURCE_DATE_EPOCH is not present, or to the provided SOURCE_DATE_EPOCH when present, in a way similar to how we already handle for head.created. However, there is a slight complication in that head.created can be overridden via openTypeHeadCrated (which .glyphs stores in "date" attribute and glyphsLib converts to the UFO equivalent), whereas head.modified cannot. So if openTypeHeadCrated is set (but SOURCE_DATE_EPOCH is not), then head.created would be set to whatever that openTypeHeadCrated is, whereas head.modified would be set to the current (non reproducible) time...

anthrotype commented 1 year ago

or we could even add a new compilation parameter called recalcTimestamp=True to all ufo2ft's compile methods, which would control how the new TTFont is constructed; if it's True (default), fonttools will recalculate based on the current time (or SOURCE_DATE_EPOCH if specified), thus we keep backward compatibility; if it is set to False, then fonttools will not touch the head.modified upon saving, and ufo2ft can set head.modified == head.created whatever that was (be it font.info.openTypeHeadCreated or the current time or SOURCE_DATE_EPOCH).

IvanUkhov commented 1 year ago

Are you saying you do not want to even set SOURCE_DATE_EPOCH at all, but still have modified be qual to whatever created is?

I did plan to set it during compilation. In my mind, without introducing any new variables, I would set SOURCE_DATE_EPOCH to any value, which would be ignored during compilation for created, as we have openTypeHeadCreated set in the source, but would lead to modified to be set to created. Hence, we would have original creation timestamps and never-changing modification timestamps, provided SOURCE_DATE_EPOCH is present in the environment upon compilation.

anthrotype commented 1 year ago

SOURCE_DATE_EPOCH has a specific, well documented meaning in open source software, see https://reproducible-builds.org/docs/source-date-epoch/ and that's what ufo2ft implements already, it does parse that when it is set in the environment and use it when setting head.created (and indirectly, via fontTools recalcTimestamp, to head.modified as well). We can't simply abuse it and assign to it additional meanings. I especially don't like the idea of just checking for its presence regardless of its value. If you are going to set SOURCE_DATE_EPOCH in your build, then I don't think we have an issue here. It will be used for compilation for both head.created/modified as the files gets saved to disk.

anthrotype commented 1 year ago

I think an openTypeHeadModified would make sense to have, so I filed: https://github.com/unified-font-object/ufo-spec/issues/214

IvanUkhov commented 1 year ago

maybe what we could do is to not rely on fonttools updating the head.modified field upon compilation, but instead set that ourselves directly inside ufo2ft to the desired final value.

I now understand that my changes here are not sufficient. As you kept pointing out, it will be replaced in fontTools, as recalcTimestamp is true by default. One would have to find a solution, and what you proposed sounds reasonable. We can initialize TTFont with recalcTimestamp=False and set it to what we want in ufo2ft.

However, there is a slight complication in that head.created can be overridden via openTypeHeadCrated

Not sure I see this complication. Where we set head.created and head.modified, we take into consideration what we have in openTypeHeadCreated. There is no parameter for modified, but it is not needed.

head.created would be set to whatever that openTypeHeadCrated is

Exactly. This is the desired behavior.

whereas head.modified would be set to the current (non reproducible) time...

But this is what we discuss here.

SOURCE_DATE_EPOCH has a specific, well documented meaning in open source software

I understand. The specification says one “MUST use this variable for embedded timestamps in place of the ‘current’ date and time.” If we set modified to created, it will be deviating from this. I am tempted to propose to interpret SOURCE_DATE_EPOCH as merely a signal that reproducible results are required without prescribing how exactly, but that would, of course, not comply with the specification.

If you are going to set SOURCE_DATE_EPOCH in your build, then I don't think we have an issue here. It will be used for compilation for both head.created/modified as the files gets saved to disk.

For clarify, what value do you recommend setting it to? Do you mean that, for each family, we would read the creation timestamp from the source, convert it to a UNIX timestamp, and put it in SOURCE_DATE_EPOCH? I am not saying it would not work; I am more saying it would be an uncomfortable, unnatural manipulation. But please confirm so I understand what solution you have in mind here for a growing collection of typefaces. Perhaps our use case is not mainstream, and we should bite the bullet and do it that uncomfortable way.

I think an openTypeHeadModified would make sense to have

Thank you. It would be good to have. It would take time, I suppose.

anthrotype commented 1 year ago

Do you mean that, for each family, we would read the creation timestamp from the source, convert it to a UNIX timestamp, and put it in SOURCE_DATE_EPOCH?

yeah basically, but I can see how that could be annoying. SOURCE_DATE_EPOCH works best when there's no openTypeHeadCreated at all in the source, and one wants to freeze the built output to something in time that's external to the source themselves.

Where we set head.created and head.modified, we take into consideration what we have in openTypeHeadCreated. There is no parameter for modified, but it is not needed.

So are you suggesting we understand the value in openTypeHeadCreated to also set head.modified in addition to head.created? The UFO spec only describes it as "creation date", it's clearly distinct from modified, so I am not sure this would be advisable.

One thing I don't want to change is the current default behaviour of (non reproducibly) set head.modified to the current time; it's been like that since beginning and there's no good reason to change.

IvanUkhov commented 1 year ago

OK, let me maybe try to implement the annoying solution internally and see how it goes.

Thank you very much for the input!

khaledhosny commented 1 year ago

FWIW, I set SOURCE_DATE_EPOCH to the source file modification date in Makefiles (something like export SOURCE_DATE_EPOCH = $(stat -c "%Y" $<)), and it works for local builds, but git does not keep file modification time, so it does not help with CI builds or fresh checkouts in general.

IvanUkhov commented 1 year ago

We ended up taking the timestamp from the last commit touching the source. The whole solution is actually not that much trouble.