hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
976 stars 243 forks source link

Document procedure for pipeline cost estimation on Hail Batch #14711

Open kasittig opened 4 days ago

kasittig commented 4 days ago

A potential research collaborator is evaluating data platforms for running analysis pipelines on their upcoming very large dataset. They're interested in estimating the cost of running an existing pipeline using the Hail Query framework.

I think that getting one number here is likely very difficult. I do also think that this is a completely reasonable question for them to ask and that it would greatly benefit us to have some kind of documentation on cost estimation. Other collaborators might also have ideas.

chrisvittal commented 20 hours ago

Some notes from discussion:

  1. Maybe add a pricing page with up to date pricing for resources.
  2. It is difficult to determine all the work that will run just from a hail pipeline.
  3. Teach users how to inspect the work that hail actually does?