Open chrisvittal opened 11 months ago
This seems to be the point in the pipeline where the join occurs.
def annotate_with_mu(
ht: hl.Table,
mutation_ht: hl.Table,
mu_annotation: str = "mu_snp",
) -> hl.Table:
"""
Annotate SNP mutation rate for the input Table.
.. note::
Function expects that`ht` includes`mutation_ht`'s key fields. Note that these
annotations don't need to be the keys of `ht`.
:param ht: Input Table to annotate.
:param mutation_ht: Mutation rate Table.
:param mu_annotation: The name of mutation rate annotation in `mutation_ht`.
Default is 'mu_snp'.
:return: Table with mutational rate annotation added.
"""
mu = mutation_ht.index(*[ht[k] for k in mutation_ht.key])[mu_annotation]
return ht.annotate(
**{mu_annotation: hl.case().when(hl.is_defined(mu), mu).or_error("Missing mu")}
)
Failed log here: constraint_pipeline.log
@chrisvittal , is this replicable or transient?
Waiting until after ASHG to pick this up again. Talk to to Kristin to confirm its replicable.
Another very simple pipeline reported https://hail.zulipchat.com/#narrow/stream/123010-Hail-Query-0.2E2-support/topic/zip.3A.20length.20mismatch . We can get access to these files via Sam B.
context_mis_freq_ht = hl.read_table("gs://epi25/misc-data/gnomAD_v4/grch38_context_vep_annotated.v105.prefiltered.missense_freq_ensp.ht")
ensp2uniprot_ht = hl.import_table("gs://epi-mis-3d/misc/ensp2uniprot_mart_export.ensp2uniprot.txt")
context_mis_freq_ht = context_mis_freq_ht.key_by("ensp")
ensp2uniprot_ht = ensp2uniprot_ht.key_by("ensp")
context_mis_freq_ht = context_mis_freq_ht.annotate(
uniprot = ensp2uniprot_ht[context_mis_freq_ht.ensp].uniprot)
notice that the error is removed if you instead use:
context_mis_freq_ht = hl.read_table("gs://epi25/misc-data/gnomAD_v4/grch38_context_vep_annotated.v105.prefiltered.missense_freq_ensp.ht")
ensp2uniprot_ht = hl.import_table("gs://epi-mis-3d/misc/ensp2uniprot_mart_export.ensp2uniprot.txt")
context_mis_freq_ht = context_mis_freq_ht.key_by("ensp")
ensp2uniprot_ht = ensp2uniprot_ht.key_by("ensp")
context_mis_freq_ht = context_mis_freq_ht.join(ensp2uniprot_ht,'left')
from: https://discuss.hail.is/t/zip-length-mismatch-error/3548
I've obtained a log from a failed run and do see us zipping the two contexts together without making sure they're the same length.