google / built_value.dart

Immutable value types, enum classes, and serialization.
https://pub.dev/packages/built_value
BSD 3-Clause "New" or "Revised" License
861 stars 184 forks source link

optimize building generated files #1304

Open dave-doty opened 6 months ago

dave-doty commented 6 months ago

Sorry, this may not be appropriate as a GitHub issue, but I just wanted to ask for advice/clarification on how built_value generates the .g.dart files.

My project has gotten to the point of using hundreds of built_value instances. It now takes about a minute to compile, sometimes two, which slows down development when testing several incremental changes consecutively.

In this comment with the OverReact authors: https://github.com/Workiva/over_react/issues/434#issuecomment-662475785, they suggest that built_value takes a while to compile on large projects because of its use of this package: https://pub.dev/packages/source_gen

I don't really understand the issues. I know OverReact itself generates code in .g.dart files, and I have about as many OverReact classes, of similar complexity, as built_value classes. Of course I don't know for sure that they are not also slowing things down, but according to the authors, their code generation is much faster.

So I'm wondering

  1. Whether you agree that source code generation is slower with source_gen than other methods.
  2. If you think there is some way (whether by switching away from source_gen, or something else) to speed up compilation.

Thank you.

davidmorgan commented 6 months ago

Thanks for filing! I'm definitely interested in performance issues.

The first thing I would try, if it happens to be easy, is splitting up your code into more packages. If you can arrange that the built_value types depend on less code then that should help.

But of course that may not be possible.

It would be interesting if I can see a repo of your setup, I don't suppose your code is on GitHub? :) if not then maybe you could describe the structure and I could go from that, e.g. how many files+class built_value and non-built_value in how many packages, how you're running the codegen. Then maybe I can make something comparable to experiment with.

Thanks!

dave-doty commented 6 months ago

Thanks for filing! I'm definitely interested in performance issues.

The first thing I would try, if it happens to be easy, is splitting up your code into more packages. If you can arrange that the built_value types depend on less code then that should help.

But of course that may not be possible.

Thank you! I'm not sure exactly what you mean about splitting up the code into more packages. It's an single-page application.

It would be interesting if I can see a repo of your setup, I don't suppose your code is on GitHub? :) if not then maybe you could describe the structure and I could go from that, e.g. how many files+class built_value and non-built_value in how many packages, how you're running the codegen. Then maybe I can make something comparable to experiment with.

It's here: https://github.com/UC-Davis-molecular-computing/scadnano. Most of the built_value types are in lib/src/state and lib/src/actions/actions.dart.

The overall architecture is using React and Redux to implement the Model-View-Update architecture. Redux requires an immutable model, hence the need for built_value. It's also the case that built_value's automatic JSON serialization is helpful, because Redux devtools uses that to display the model (a.k.a. state) and actions that changed it (which is why I also make the actions immutable; I don't think Redux needs them immutable, but Redux devtools needs them JSON serializable).

Unfortunately it's a bit annoying to set up for development if you want to try to see for yourself how it compiles. There are instructions in CONTRIBUTING.md. But we're in a bit of a dependency hell, where we have to use Dart 2.13 or earlier.

davidmorgan commented 6 months ago

Thanks for the code! I tried profiling as built_value generates for your project to understand what's going on.

I think there's a possibility for improving performance 30-40% by caching parsing when it reads the AST, but it's not trivial to land, I'd have to look into it and talk to the analyzer team about the right way to do it. (In case you hadn't noticed! I'm actually on the Dart team and currently working on macros, so I am working in this area.)

The code structure is definitely worth thinking about.

The way built_value analyzes the code, it has to analyze all the transitive imports of a file in order to generate. So wherever you have a built_value class, see what is imported in that file: the list matters, a bigger list of imports (and imports from those imports!) does two things: it makes the generation slower and it means it will trigger more often.

I think there are some patterns in your app that make it very common to import the whole app: for example an app global variable that probably depends on everything and is used by everything. This causes generation to have to analyze the whole app and to have to rerun when anything changes across the whole app.

The reason I mention multiple packages is that splitting to multiple packages can help to improve dependencies. You can split to multiple packages by creating more folders with pubspec.yaml files and using path dependencies between them. This works perfectly well for a single app, it's how I structure my own projects if they are not tiny. Then you can say "dependencies are allowed from package foo onto package bar but not the other way around" and this helps you keep the dependencies low for "bar".

But you could also do this within a single package. Using folders or naming conventions could help instead, for example you could say "dependencies are allowed from subfolder ui onto model but not the other way around".

You can break dependency cycles by introducing interfaces. For example if A in its implementation depends on B depends on C depends on A, you can split A into A and AImpl, and have A depend on nothing, breaking the cycle.

One practical note, the fact that you're not on Dart 3 yet means I probably won't be able to land anything to help you. You might consider upgrading? I understand that it's quite a lot of work to migrate to null safety, of course, and you mentioned dependency problems :(

But restructuring should give you better performance whatever the version of Dart :)

I hope that helps! Thanks again for filing.

jakemac53 commented 6 months ago

The reason I mention multiple packages is that splitting to multiple packages can help to improve dependencies.

Note that nowadays, the resolver is also optimized when you split across multiple packages, it won't actually read all transitive imports, we compute and output a digest of the transitive imports of a given file next to it, and if available will only read that instead. This can be an extremely dramatic improvement. But, it only works for package dependencies.

dave-doty commented 6 months ago

Thank you for the suggestions, hopefully some of these will work for us.

OverReact only updated to Dart 3 with null safety a few days ago, we've been waiting for that to try to tackle upgrading ourselves, but as you said, it's got to be a big and tedious project.

davidmorgan commented 6 months ago

As it happens I have a lot of experience with null safety migrations :) ... the migration tool is very helpful, but large packages like yours can be tough.

Splitting into smaller pieces that can be migrated separately can help with that, too :)

One positive note though, use of built_value likely makes your code easier to migrate--the migration tool understands the nullable annotation and will automatically migrate those classes correctly, giving a solid base for other decisions it makes :) ... it does not understand built_collection, so you can help it by removing ? anywhere it suggests one, e.g. BuiltMap<String, int?> is wrong, as pre-migration BuiltMap would reject any null values at runtime.