Change cherry picking algorithm

There is a risk of high fluctuation between the number of pairs in a bin per inchikey.

If you have 5 spectra in a row with no inchikey between 0.9 and 1.0 and the next spectrum has more than a 100, this implementation will result in 100 inchikeys for this inchikey even if the next inchikey also had more matches in this bin.

The behaviour I would prefer is, that it would increase the max bin for all other inchikeys (also the one already calculated). An option could be storing at first 2x max_pairs_per_bin if available and at the end after calculating all tanimoto scores determine how high the max_pair_per_bin should be to reach an average that matches the defined max_pair_per_bin.

This will be a bit more complex in implementation and result in a bit extra overhead and intermediate storage, but I think it is still doable and it reduces the risk of introducing other biases, like oversampling clusters with many similar inchikeys. It also would make the resulting max_pairs_global be always 0 (unless there is a very extreme distribution).

matchms / ms2deepscore

Change cherry picking algorithm #146