populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Enable pre-commit mypy checks to typecheck cpg_utils usage #728

Open jmarshall opened 2 months ago

jmarshall commented 2 months ago

Our pre-commit hooks do not detect the error in

   input_vcf = get_batch().read_input(inputs.as_dict(cohort, FilterGenotypes)['filtered_vcf'])
   reveal_types(get_batch())

because the (pre-commit-local) environment in which the hook runs does not have cpg_utils installed, so the type returned by get_batch() is known only as Any so the first line does not get type-checked.

So the missing read_input(str(…)) is not detected — but see also populationgenomics/cpg-utils#166 which suggests rendering that particular str(…) unnecessary.

Note that there are lots of minor type-checking errors lurking in the code that would have to be fixed before we can merge this PR and enable this checking!

MattWellie commented 2 months ago

Nice catch, I hate the results 😭

Gut feeling is that this change is essential (we're not as type-safe as we think we are), and this mitigates the need for https://github.com/populationgenomics/cpg-utils/pull/166 as we're no longer blind to those failures. This could take a while to work through as some of the now-revealed type failures are chunks of the code currently a work in progress, e.g.

https://github.com/populationgenomics/production-pipelines/pull/724 https://github.com/populationgenomics/production-pipelines/pull/718

jmarshall commented 1 month ago

IMHO populationgenomics/cpg-utils#166 or an equivalent is a good idea regardless: the function expects the parameter to be a filename path, so having to wrap str(…) around your path-like argument is just make-work.

Hence I plan to propose https://github.com/populationgenomics/cpg-utils/pull/166#issuecomment-2097141696 upstream on that basis that it is a generally applicable improvement to this part of the API.