CAGRA has been observed to yield low recall when filtering is enabled, especially when the ratio of filtered-out values is high. This can be related in part to #208 and #472 , but there also may be fundamental reasons for the lower recall.
This feature request tracks the progress and suggestions to enable high-recall strongly filtered CAGRA.
As an experiment, I suggest to try the following tweaks, enabled by a boolean search parameter:
Replace the hashmap with a dataset-long bitset. It's used to track the visited nodes. By replacing a small hashmap with the bitset we will eliminate hash collisions (thus, false-positives) and prevent CAGRA from early-stopping.
CAGRA has been observed to yield low recall when filtering is enabled, especially when the ratio of filtered-out values is high. This can be related in part to #208 and #472 , but there also may be fundamental reasons for the lower recall.
This feature request tracks the progress and suggestions to enable high-recall strongly filtered CAGRA.
As an experiment, I suggest to try the following tweaks, enabled by a boolean search parameter: