populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

Parallel processing blocked #18

Closed MattWellie closed 1 year ago

MattWellie commented 2 years ago

The code has been written so that we can parallelize processing at the gene or contig level for speed. All MOI tests are run on variants grouped with all other vars in the same gene, which would make that a logical level to split processing.

The Peddy ped file representation being used is currently resistant to being pickled, with an error being thrown: _pickle.PicklingError: Can't pickle <class 'peddy.peddy.UNKNOWN'>: it's not the same object as peddy.peddy.UNKNOWN

Probably relating to the handling of unknown members in the Pedigree: https://github.com/brentp/peddy/blob/master/peddy/peddy.py#L102-L104

@lgruen suggestion on Slack: overriding __getstate__ and __setstate__ could work around this?

With current test data there is no need to further speed up processing, but this is being logged as a future hurdle

MattWellie commented 2 years ago

Feedback from Brent Pedersen is that switching to the Somalier representation could solve, but that seems like a heavy dependency when all I need is a PED representation.

For now it looks like this will be a blocker, but we're not seeing signs that this will be required (8s runtime on 150 samples)

MattWellie commented 1 year ago

Just not very important, I'll revisit this if we ever end up having issues with runtime