From a cost sweet spot, it might make sense to launch an EC2 Spot/Batch/Fargate/Codebuild then smurf more costly lambda workers on the highly parallel sections. For this to work you need artifact execution times for an architecture, localhost/network latency, and localhost/network bandwidth.
Mind dump of questions:
How do you get network/localhost profiles for latency/bandwidth?
Can the EC2 use sqlite, /dev/zram, or /dev/shm to get faster than localhost SSD IO? Does it make sense blasting all the code into a RAM filesystem before compiling on EC2 instances? Does the EC2 boxen even need attached storage?
Can you hack Clang/LLVM to memdump files for later use so they don't have to be serialized/deserialized? Also hack clang/llvm so it can work in batch mode like SMT solvers where it does pushes and pops to avoid startup overhead of the process with every invocation?
What is the boot cost of EC2 Spot/Batch/Fargate/Codebuild ? Seconds you waste + IO overhead cost on transferring the AMI/container.
When is there a S3/latency cost win by using zstd to dictionary compress a file at rest on S3? Could it help by training several dictionaries, is is one dictionary for all of the codebase best? https://engineering.fb.com/2018/12/19/core-data/zstandard/
For container based EC2 runs, how much do you save by stripping the AMI/container image down to a minimal size? Are AMIs or containers more cost effective?
How much do you save doing PGO/BOLT/LTO on llamacc binaries? Can you HTTP range query lazy load from S3 so the instance only needs the clang/llvm/linker portions it will need for a task?
Does it pay to be evil by using the AMI/containter/lambda_layer to store code being complied so it no-ops the S3 read? Even as evil as storing binary artifacts from the previous run and no-op parts of the task graph that are un-tainted from change?
For spot pricing is there a good tool to get costs across AZs/Regions?
From a cost sweet spot, it might make sense to launch an EC2 Spot/Batch/Fargate/Codebuild then smurf more costly lambda workers on the highly parallel sections. For this to work you need artifact execution times for an architecture, localhost/network latency, and localhost/network bandwidth.
Mind dump of questions:
Would monte-carlo help to know where you get value by speculatively executing more than one worker at once on the same task?
Is there a good way to take the ninja/makefile and emit the task graph for processing?
How much work to get task graph dumps of Rust Cargo builds?
How much can you get out of kernel tuning EC2? Again, see https://talawah.io/blog/extreme-http-performance-tuning-one-point-two-million/ . He left of the table a PGO/BOLT build of the kernel itself. https://linuxplumbersconf.org/event/7/contributions/798/
Can the EC2 use sqlite, /dev/zram, or /dev/shm to get faster than localhost SSD IO? Does it make sense blasting all the code into a RAM filesystem before compiling on EC2 instances? Does the EC2 boxen even need attached storage?
Can you hack Clang/LLVM to memdump files for later use so they don't have to be serialized/deserialized? Also hack clang/llvm so it can work in batch mode like SMT solvers where it does pushes and pops to avoid startup overhead of the process with every invocation?
What is the boot cost of EC2 Spot/Batch/Fargate/Codebuild ? Seconds you waste + IO overhead cost on transferring the AMI/container.
When is there a S3/latency cost win by using zstd to dictionary compress a file at rest on S3? Could it help by training several dictionaries, is is one dictionary for all of the codebase best? https://engineering.fb.com/2018/12/19/core-data/zstandard/
For container based EC2 runs, how much do you save by stripping the AMI/container image down to a minimal size? Are AMIs or containers more cost effective?
Can writing to larger files then reading them sparsely using HTTP range queries help? https://github.com/dacort/athena-sqlite/blob/master/lambda-function/vfs.py
How much do you save doing PGO/BOLT/LTO on llamacc binaries? Can you HTTP range query lazy load from S3 so the instance only needs the clang/llvm/linker portions it will need for a task?
Does it pay to be evil by using the AMI/containter/lambda_layer to store code being complied so it no-ops the S3 read? Even as evil as storing binary artifacts from the previous run and no-op parts of the task graph that are un-tainted from change?
For spot pricing is there a good tool to get costs across AZs/Regions?
How much of a win is there on using single AZ S3 instances? https://aws.amazon.com/s3/faqs/
When does EFS beat S3 on cost/performance?