populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Cast somalier_sites reference to path and take the basename #746

Closed EddieLF closed 1 month ago

EddieLF commented 1 month ago

The somalier extract jobs are failing (e.g.) because of changes to cpg-utils reference_path function. The somalier extract job was changed in this PR.

The reference_path function used to be in cpg_utils.hail_batch, it would return a Path object like so:

def reference_path(key: str) -> Path:
    return to_path(retrieve(['references'] + key.strip('/').split('/')))

Following the update, reference_path (now in cpg_utils.config) returns a string

def reference_path(key: str) -> str:
    return config_retrieve(['references', *key.strip('/').split('/')])

This change restores the Pathlike.name aspect of the somalier_sites reference path in the context of the extract job. Without this, the batch job tries to append a full gs:// path to the batch mounted storage, leading to job failure as it tries to read /io/batch/hash/sites/gs://cpg-common-main/references/somalier/sites.hg38.vcf.gz.