unity-sds / unity-project-management

Container repo for project management (projects, epics, etc)
Apache License 2.0
0 stars 2 forks source link

Airflow Scaling improvements + Performance Testing #146

Open mike-gangl opened 8 months ago

mike-gangl commented 8 months ago

Airflow Scaling improvements

Two major use cases to achieve here:

Essentially, an idle system should have no workers running.

Acceptance Criteria

Acceptance criteria required to implement the epic

Work Tickets

Link to work tickets required to implement the epic

Dependencies

Other epics or outside tickets required for this to work

Associated Risks

links to risk issues associated with this epic

mike-gangl commented 5 months ago

The testing numbers we're looking at are as follows (from @hookhua )

SBG TIR • forward processing

  • ~6.2K PGE jobs per day
  • ~44.6TB per day of L0 to L4 products
  • ~148 parallel compute instances • bulk processing
  • ~18.7K PGE jobs per day
  • ~134TB per day of L0 to L4 products
  • ~444 parallel compute instances • low-latency processing
  • ~2.9K PGE jobs per day
  • ~35.4TB per day of L0 to L4 products
  • ~80 parallel compute instances SBG VSWIR • forward processing
  • ~21.9K PGE jobs per day
  • ~56.5TB per day of L0 to L2 products
  • ~577 parallel compute instances • bulk processing
  • ~65,6K PGE jobs per day
  • ~169.5TB per day of L0 to L4 products
  • ~1,731 parallel compute instances

For a single SBG workflow as built in our system (L0 -> L2A):

5 + 1 PGEs Data:

  1. L1B-preprocess = ~4.2GB
  2. ISOFIT (RFL) =. 8.2GB
  3. Resampled Reflectance (RSFL) = 6GB
  4. Corrected Reflectance = 3GB
  5. Fractional cover = 25MB

If we include the 'optional' veg-biochem workflow as well, we'd add an additional PGE

  1. VEGBIOCHEM = 45MB

So for any product, we could run 5PGEs and move ~19GB through the system. This equates to:

TIR 1240 workflows 23,560GB of data

VSWIR 4380 workflows 83,220GB of data