Improve the Partition module performance

@nothingmuch has suggested the following improvements in the pull request #1:

The Filter implementations rely on Bloom filters to skip checking negatives. However, the filters are arguably sized inappropriately, specifying a false positive rate for the size of the set, not its corresponding sumset, which may be substantially larger. This is particularly problematic with Bloom filters, as opposed to say cuckoo or quotient filters which are both resizable, and their FPR does not asymptotically approach 100% when close to capacity.
An LFU cache would handle positive matches, avoiding expensive computations (e.g., is_subset_sum).
If adjusting the bit vector-based powerset enumeration to use a grey code instead of 2's complement (i.e., normal integer incrementing), the sums could be adjusted by subtracting or adding one element at a time, allowing the previous sum value to be reused more easily. This might make an LRU cache a better fit than LFU for positive cases.
Filter.contains is invoked repeatedly on a fixed value (left set of SumFilteredPartitionIterator).
The trait object for Filter seems unnecessary, preventing monomorphisation and any optimizations that would enable, and adding indirection in fairly hot code paths.

payjoin / cja