popsim-consortium / analysis2

Analysis for the second consortium paper.
8 stars 14 forks source link

Issue with all_sites #115

Closed silastittes closed 1 month ago

silastittes commented 4 months ago

The production pipeline has this line in it:

"annotation_list": ["all_sites", "Phocoena_sinus.mPhoSin1.pri.110_exons"]

but this fails with:

Annotations 'PhoSin/all_sites' not in catalog (Phocoena_sinus.mPhoSin1.pri.110_exons, Phocoena_sinus.mPhoSin1.pri.110_CDS)

I believe this boils down to this line https://github.com/popsim-consortium/analysis2/blob/7ebe47a249cca71cd35f8ce96c7d7247b669d139/workflows/masks.py#L47

It seems like we need some conditional there to handle passing all_sites correctly, but I'm not quite sure what that would be, or just not use all sites option? I'm very open to suggestions on this one.

andrewkern commented 4 months ago

IIRC there is an all_sites conditional in the workflow?

andrewkern commented 4 months ago

https://github.com/popsim-consortium/analysis2/blob/7ebe47a249cca71cd35f8ce96c7d7247b669d139/workflows/simulation.snake#L79C9-L79C44

silastittes commented 4 months ago

Ok cool. This error is coming up on the n_t_gone_prep_inputs rule. Guess we just need to transmit that info across rules somehow.

silastittes commented 4 months ago

Ok, reading a bit closer, if I understand correctly, for n(t) rules this is about masking out functional regions. In the case of all_sites, we need to ignore masking cause otherwise we're masking the whole genome right? Should be a pretty easy fix if so.

andrewkern commented 4 months ago

yep exactly