Closed andrewpatto closed 1 year ago
What are the memory requirements here? Somalier fingerprint files are pretty small
Its possibly much much more that 50 that we can do on big lambdas now - I just remember reading some issue on somalier github where someone was saying they were running it on 10,000(?) samples and it was grinding their EC2 instances into the ground.
@alexiswl can you get a feel for the mem/time requirements from files you have on your EC2? Would be good to understand 50 pairs = 20MB and 1:00 1000 pairs = 1GB and 3:35 (all numbers made up but you get the drift)
There is now a pairs step function that returns the all pairs HTML from somalier - and takes a list of BAMs. I have not tested for what sort of number of BAMS will start to hit lambda limits. I am expecting this to be used for investigating false negatives etc (i.e < 50 bams)
Whilst Holmes is designed to meet the 1:N fingerprint check (index v all the other samples) - when it comes to investigations it would be useful to be able to get the real somalier all-pairs output.
Somalier all-pairs does not scale up well for very large numbers (10,000+) (certainly not if running on lambdas) - so there will need to be a ceiling on the number of BAMs that can be compared in all pairs mode (50?)