Open jpeach opened 3 years ago
Seems like there's a test failure -- can you take a look?
This is also somehow causing OOMs for my LLVM build (again using remote preprocessing); I haven't figured out why yet.
Seems like there's a test failure -- can you take a look?
yeh I broke some internal contract ... will take a look next weekend :)
This is also somehow causing OOMs for my LLVM build (again using remote preprocessing); I haven't figured out why yet.
That's weird! For me with (partial) local preprocessing this makes memory usage nice and stable.
Hm, I think I understand the OOM now. For remote preprocessing, we see the same header files many, many, many times; with this change, we compress each time before we hash and look at the upload cache, which means we actually generate many times more garbage in that case than we did previously, and I think we end up in a similar situation where the GC fails to keep up.
I'm also not sure if zstd is deterministic – is there a risk that we end up uploading multiple versions of the same file if they get compressed differently?
Looking a bit more, it looks like zstd has a massive per-encoder memory footprint -- at least a few MiB. For remote preprocessing, early builds upload hundreds of header files in a go, which results in us trying to create a new encoder for each one concurrently. It might make sense to rate-limit compression to one job per core or something, anyways, which would help with that…
Looking a bit more, it looks like zstd has a massive per-encoder memory footprint -- at least a few MiB. For remote preprocessing, early builds upload hundreds of header files in a go, which results in us trying to create a new encoder for each one concurrently. It might make sense to rate-limit compression to one job per core or something, anyways, which would help with that…
Maybe using a sync.Pool would help with that, but based on your comments about the remote compile use case, I probably want to revisit this PR. It's starting to feel like it causes more problems than it solves :)
I'm wondering whether just using mmap (for larger files) might be a better way to deal with memory costs of reading the inputs.
@nelhage Are you able to share your setup for compiling LLVM using llama? I have a few ideas regarding perf improvements that I'd like to try out and it would be good to have a similar baseline to what you currently see.
Sure! The blog post should have most of it (https://blog.nelhage.com/post/building-llvm-in-90s/); I'm building on an AMD Ryzen 8 3900X 12-core / 24-thread processor, on Sonic fiber internet (but on wifi on my desktop). Client is Ubuntu 20.04 Focal. What else would be helpful for you?
Rather than read large files into memory and process them multiple times (for hashing and for compression), use streaming compression for files so that only the compressed output needs to be fully in memory.
This has the side-effect of the object ID being generated by hashing the compressed text (which is better because there's less to hash after compression). This means that the function image needs to be regenerated to match.
This updates #45.