Closed MattWellie closed 1 year ago
Feedback from Brent Pedersen is that switching to the Somalier representation could solve, but that seems like a heavy dependency when all I need is a PED representation.
For now it looks like this will be a blocker, but we're not seeing signs that this will be required (8s runtime on 150 samples)
Just not very important, I'll revisit this if we ever end up having issues with runtime
The code has been written so that we can parallelize processing at the gene or contig level for speed. All MOI tests are run on variants grouped with all other vars in the same gene, which would make that a logical level to split processing.
The Peddy ped file representation being used is currently resistant to being pickled, with an error being thrown:
_pickle.PicklingError: Can't pickle <class 'peddy.peddy.UNKNOWN'>: it's not the same object as peddy.peddy.UNKNOWN
Probably relating to the handling of unknown members in the Pedigree: https://github.com/brentp/peddy/blob/master/peddy/peddy.py#L102-L104
@lgruen suggestion on Slack:
overriding __getstate__ and __setstate__ could work around this?
With current test data there is no need to further speed up processing, but this is being logged as a future hurdle