populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Improve Somalier Pedigree checking and Slack notifications #792

Open EddieLF opened 2 weeks ago

EddieLF commented 2 weeks ago

There are some bugs in the SomalierPedigree QC stage and the way that relatedness QC flags are reported in the slack notifications.

How it works

Somalier Extract

Somalier extract downloads each sequencing group's CRAM (and index) and then runs the somalier extract command, specifying the sites file from references - /references/somalier/sites.hg38.vcf.gz (originally from the somalier repo). This creates the Somalier "fingerprint" file - e.g. CPG12345.cram.somalier.

Somalier Relate

The way the stage works is by fetching all somalier fingerprint files for the input sequencing groups and then running Somalier Relate over all pairwise combinations of input sequencing groups within the same dataset.

Once all inputs sequencing groups have been decided (via the FREEMIX check above), the somalier relate command is invoked. This generates several outputs:

Check Pedigree script

These results are used by the check_pedigree.py script from /cpg_workflows/python_scripts/. This script looks through the relatedness information produced by Somalier Relate, and formulates a Slack message to post to a private channel.

What's not working

Unnecessary Slack message warnings of note:

Both of these situations should be valid and not cause errors (I think).

Here is an example Slack post containing these warnings: https://centrepopgen.slack.com/archives/C03BHD3EKH9/p1718751311646699