playframework / twirl

Twirl is Play's default template engine
Apache License 2.0
545 stars 108 forks source link

TwirlCompiler: Make output reproducible-friendly #209

Closed BardurArantsson closed 3 years ago

BardurArantsson commented 5 years ago

Hi,

Currently the TwirlCompiler emits a comment like this:

                  -- GENERATED --
                  DATE: Wed Mar 06 05:39:00 CET 2019
                  SOURCE: /home/bardur/wd/cwconsult/silo/modules/postgresql/src/main/twirl/dk/cwconsult/silo/postgresql/core/impl/LockStreamSQL.scala.txt
                  HASH: aca20403ba81a6a79ef8db751c2a0f13b68222eb
                  MATRIX: 651->1|764->22|791->23|829->35|864->50|892->51
                  LINES: 15->1|20->2|21->3|21->3|21->3|21->3
                  -- GENERATED --

Unfortunately, this contains two pieces of information that are bad for reproducible builds:

(It's generally also bad for build systems where you ideally want the output of a build step to be stable if its inputs are -- to avoid e.g. spurious rebuilds.)

It would be much appreciated if they could either just be outright removed[0] or if we could have an option to disable embedding these 'fields'. (I'm not sure if the ordering on e.g. MATRIX and LINES output is deterministic, but it should be made so as well.)

If either of these seems acceptable, I think I should be able to code it up fairly quickly.

[0] I'm not really familiar with the Play ecosystem, but one assumes that this may be for template-debugging purposes where a dev server can show the file?

BardurArantsson commented 5 years ago

Anyone?

marcospereira commented 5 years ago

Hey @BardurArantsson, it is unclear to me what is the problem here. Are the builds relying on this information? If so, how?

BardurArantsson commented 5 years ago

The TL;DR is that the output changes needlessly on recompilation of the same input -- which makes interacting with build systems like Pants, Bazel, etc. less than ideal because they'll assume that because the twirl output file has changed (its hash has changed), they need to recompile the output (and whatever depends on that output).

Does that make sense?

stephane303 commented 4 years ago

Any progress on this ? We now have 452 view files in our play project. Everytime we change ANY of the view scala file, ALL generated scala file get recompiled, this is critical to our productivity. We are using gradle to build our project.

octonato commented 4 years ago

@stephane303, this is not a planned feature/change. Of course, if someone wants to send a PR it's mostly welcome.

stephane303 commented 4 years ago

Unfortunalely I do not have the knowledge to setup my environment to fork this repositery and make my environnement point to it, maybe you can help me with that ? Thank you.

octonato commented 4 years ago

@stephane303, the first think you need to do it to fork in GitHub.

You can start by following the instructions on this page: https://help.github.com/en/github/getting-started-with-github/fork-a-repo

octonato commented 4 years ago

Then you can work on the project and push changest to your fork.

Once you are done, you can make a pull-request from your fork to the main repository. https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request

ignasi35 commented 4 years ago

I'm not entirely sure about the actual trigger, but is it possible that producing the exact same output (that, removing DATE and fixing SOURCE) still triggers a new build?

I mean, if the code polling the generated files checks the file-changed timestamp, but not the content, then the compilation will be triggered anyway.

ignasi35 commented 4 years ago

I'm not entirely sure about the actual trigger, but is it possible that producing the exact same output (that, removing DATE and fixing SOURCE) still triggers a new build?

I mean, if the code polling the generated files checks the file-changed timestamp, but not the content, then the compilation will be triggered anyway.

Found https://github.com/playframework/twirl/blob/dcddfa40e4d5c40b6a514a568487f8a1991ac451/compiler/src/main/scala/play/twirl/compiler/TwirlCompiler.scala#L118-L122

Not sure if all execution paths have that check, though.

stephane303 commented 4 years ago

Then you can work on the project and push changest to your fork.

Once you are done, you can make a pull-request from your fork to the main repository. https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request

Hi, thanks, I know how to do fork and a pull request. The part that I do not know is how to use the playframework that i've forked. I need it to do my tests.

octonato commented 4 years ago

You can build and publish twirl locally using sbt publishLocal.

Then you can take a Play application and upgrade the twirl version. You don't need to add build Play.

stephane303 commented 4 years ago

Someone found a workaround, if you use gradle, which is my case. https://github.com/gradle/playframework/issues/129#issuecomment-594505441 and it works, now if I change one file, I recompile only one file !

So it is clear that these informations (SOURCE and DATE) should be removed.

ignasi35 commented 4 years ago

I think making this work for sbt requires some bigger changes in the twirl codebase.

The way the gradle workaround works is by rewriting the genrated code to remove SOURCE and DATE. This means that:

  1. given a file with twirl code,
  2. a new generated file with scala code is produced everytime, and that file contains SOURCE and DATE metadata
  3. the generated file is then replaced in place to remove SOURCE and DATE metadata
  4. gradle sees the final the generated file as identical as the previous one.

In sbt, I suspect that step 4. would not behave the same since the generated scala file, while identical, has a newer timestamp. In the sbt-twirl plugin in particular, there's a syncGenerated step that relies on SOURCE metadata being present on the generated scala file. When that SOURCE metadata is missing, the existing scala file is removed and recreated again.

The goal, then, is to change the twirl codebase so that step 2. doesn't generate the scala code unless it is strictly necessary (instead of generating the scala code everytime). This requires reviewing the sync step and the needRecompilation.

BardurArantsson commented 4 years ago

@ignasi35 I guess I don't see how this particular change is relevant to SBT? That is, it can be a simple option to the code generator and the SBT plugin can say "I want SOURCE/DATE" and other build systems can say "I doan wannit!"?

BardurArantsson commented 4 years ago

AFAICT the only thing that's needed here is to make the "print the SOURCE and DATE" conditional on a flag passed in to the codegen. Am I missing something?

ignasi35 commented 4 years ago

Adding that flag could have this impact (I'm not 100% sure, though):

The final result, then, is that all templates are twirl-compiled and scala-compiled all over again, which was the problem we were trying to solve.

I think the gradle approach works because the gradle plugin doesn't ies the sync operation I mentioned earlier or uses something similar that doesn't depend on SOURCE being available.

I agree that adding the flag would simplify things for Gradle users, though. But adding the flag also breaks API.

Summing up, it'd be great if we could solve the problem at once, for all usages of TwirlCompiler#compile without breaking API.

BardurArantsson commented 4 years ago

@ignasi35 Ah, ok. That makes sense -- the assumption being that you want to continue using SBT. However, my original proposal was actually just to add an option to TwirlCompiler to omit the bits of the output the are troublesome for non-SBT (or: hardcore "reproducible") build systems. I don't see how it "not fixing SBT builds" is an argument against that. Fixing excessive rebuilds under SBT is a non-goal of my proposal, though it is definitely an interesting problem to pursue and solve.

(The SBT plugin invokes the TwirlCompiler in its own way and it would not be affected either way by this proposal. I'm sure improvements are necessary and desirable for SBT usage, but... if my proposal were to be accepted, nothing would change for users of SBT.)

ignasi35 commented 4 years ago

Thanks for clarifying @BardurArantsson. I see your point, then the change is straight forward. Would you like to send a PR?

BardurArantsson commented 4 years ago

Unfortunately, I'm already pretty swamped as is, so it's unlikely that I'll get to that any time soon. (Hopefully I will be able to pick it up at some point, but right now this is not a big priority at work, so...)

raboof commented 3 years ago

I think this should be fixed with #378 - if anyone can try a snapshot to verify that'd be great! I'll close this for now.

bjaglin commented 3 years ago

Is there a plan for a patch release containing https://github.com/playframework/twirl/pull/378? Thanks!

SethTisue commented 3 years ago

note that we currently cannot release in this repo, because of the Bintray shutdown — the publishing needs to be changed to go to Sonatype directly

bjaglin commented 3 years ago

For the record, this was released as part of 1.5.1.