populationgenomics / metamist

Sample level metadata system
MIT License
1 stars 1 forks source link

Include Failed Jobs in cram size proportion calculation. #871

Closed milo-hyben closed 1 month ago

milo-hyben commented 1 month ago

I had discovered issue with the way we calculate cram proportion size during the seqr billing aggregation. Only sucessfully completed jobs are included. If job fails its' cost is being ignored.

I've tried to run seqr.py script to reload some of the old records. It is failing on this line: https://github.com/populationgenomics/cpg-infrastructure/blob/bf3ef24e1d3663442fe237087461b6a42572d48f/cpg_infra/billing_aggregator/aggregate/seqr.py#L623

By investigation the sql records I have noticed that 2022-01-19 we only had failed jobs:

https://batch.hail.populationgenomics.org.au/batches/7284 https://batch.hail.populationgenomics.org.au/batches/7285 https://batch.hail.populationgenomics.org.au/batches/7317 https://batch.hail.populationgenomics.org.au/batches/7321

Their total cost is is around $80 USD as reported by Hail.

This PR is aiming to fix this discrepancy.

milo-hyben commented 1 month ago

Closing this PR without merging as this is not the right solution for the problem. This PR https://github.com/populationgenomics/metamist/pull/872 is solving it.