Sage spends a significant portion of its runtime sorting (peptides, fragment ions, preliminary scoring candidates, etc.).
After preliminary scoring of all potential candidates (how many fragments matched?), Sage selects the top k (~50 or more, depending on settings) candidates for hyperscore calculation. Currently, the entire list of candidates (100s to 100,000s) is completely sorted, and then the best k elements are selected.
This PR merges in an implementation of the heapselect algorithm. A size-bounded min heap is constructed in-place to select the top k elements. This speeds up candidate selection, and can improve total runtime by 10-15% depending on the search settings.
quickcheck is used to perform property-based testing of the heapselect algorithm, and one of the integration tests has been changed to use quickcheck as well, to increase testing surface area. In addition, crate internals have been reworked a bit, to provide initial support for compiling for wasm (#77) - there is still some awkwardness here that warrants more clean up. Further restructuring/refactoring should enable support for additional file formats as well.
Of note:
Results may differ between this version of Sage and prior versions, due to a different stable sorting order, and heapselect being applied to results of sub searches (e.g. z=2, z=3, z=4)
Sage spends a significant portion of its runtime sorting (peptides, fragment ions, preliminary scoring candidates, etc.).
After preliminary scoring of all potential candidates (how many fragments matched?), Sage selects the top k (~50 or more, depending on settings) candidates for hyperscore calculation. Currently, the entire list of candidates (100s to 100,000s) is completely sorted, and then the best k elements are selected.
This PR merges in an implementation of the heapselect algorithm. A size-bounded min heap is constructed in-place to select the top k elements. This speeds up candidate selection, and can improve total runtime by 10-15% depending on the search settings.
quickcheck
is used to perform property-based testing of the heapselect algorithm, and one of the integration tests has been changed to usequickcheck
as well, to increase testing surface area. In addition, crate internals have been reworked a bit, to provide initial support for compiling for wasm (#77) - there is still some awkwardness here that warrants more clean up. Further restructuring/refactoring should enable support for additional file formats as well.Of note: