Open westonpace opened 9 months ago
Some benchmarks (@50M rows) to justify this:
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_scalar_index_search[clean-equality] 1.6202 (1.0) 2.4431 (1.0) 1.7421 (1.0) 0.1202 (1.0) 1.7206 (1.0) 0.0908 (1.0) 13;9 574.0160 (1.0) 121 1
test_scalar_index_search[clean-in_list_one] 1.6834 (1.04) 11.8010 (4.83) 2.4469 (1.40) 2.2548 (18.76) 1.8335 (1.07) 0.2162 (2.38) 3;6 408.6851 (0.71) 54 1
test_scalar_index_search[with_new_rows-in_list_one] 1.7134 (1.06) 3.3687 (1.38) 2.0622 (1.18) 0.2632 (2.19) 2.0226 (1.18) 0.1898 (2.09) 27;13 484.9220 (0.84) 182 1
test_scalar_index_search[with_new_rows-equality] 1.7388 (1.07) 4.1352 (1.69) 2.0584 (1.18) 0.3217 (2.68) 2.0105 (1.17) 0.2745 (3.02) 5;4 485.8196 (0.85) 93 1
test_scalar_index_search[clean-less_than_selective] 1.9507 (1.20) 3.4980 (1.43) 2.1856 (1.25) 0.1998 (1.66) 2.1458 (1.25) 0.1568 (1.73) 29;16 457.5327 (0.80) 298 1
test_scalar_index_search[with_new_rows-less_than_selective] 2.0266 (1.25) 7.6367 (3.13) 2.4746 (1.42) 0.4396 (3.66) 2.3821 (1.38) 0.3090 (3.40) 17;12 404.1098 (0.70) 349 1
test_scalar_index_search[with_delete_files-equality] 3.0786 (1.90) 6.1823 (2.53) 3.4407 (1.98) 0.3387 (2.82) 3.3746 (1.96) 0.2119 (2.33) 12;8 290.6377 (0.51) 137 1
test_scalar_index_search[with_delete_files-in_list_one] 3.4059 (2.10) 4.5061 (1.84) 3.7989 (2.18) 0.2360 (1.96) 3.7727 (2.19) 0.3145 (3.46) 44;3 263.2332 (0.46) 124 1
test_scalar_index_search[with_delete_files-less_than_selective] 3.6465 (2.25) 9.7798 (4.00) 4.0971 (2.35) 0.5013 (4.17) 4.0490 (2.35) 0.2774 (3.06) 6;6 244.0722 (0.43) 157 1
test_scalar_index_search[clean-none] 696.2231 (429.70) 782.1625 (320.16) 734.0862 (421.38) 38.3136 (318.72) 723.0069 (420.21) 68.9626 (759.59) 1;0 1.3622 (0.00) 5 1
test_scalar_index_search[with_delete_files-none] 739.9145 (456.67) 761.9250 (311.87) 749.1443 (430.02) 8.5249 (70.92) 747.1020 (434.22) 11.9411 (131.53) 2;0 1.3349 (0.00) 5 1
test_scalar_index_search[with_new_rows-none] 742.3391 (458.16) 792.4247 (324.36) 764.7724 (438.99) 19.8576 (165.19) 764.8674 (444.54) 30.7470 (338.66) 2;0 1.3076 (0.00) 5 1
test_scalar_index_search[clean-not_equality] 1,129.8565 (697.34) 1,363.5540 (558.13) 1,227.5580 (704.64) 110.5293 (919.48) 1,159.5230 (673.92) 191.5590 (>1000.0) 1;0 0.8146 (0.00) 5 1
test_scalar_index_search[clean-not_in_list_one] 1,134.1074 (699.96) 1,263.9427 (517.36) 1,173.0443 (673.35) 52.7296 (438.65) 1,150.1651 (668.48) 53.5838 (590.20) 1;0 0.8525 (0.00) 5 1
test_scalar_index_search[clean-not_equality_and_chain] 1,148.5113 (708.85) 1,209.9530 (495.26) 1,191.4977 (683.94) 24.6590 (205.13) 1,197.2188 (695.82) 21.5631 (237.51) 1;1 0.8393 (0.00) 5 1
test_scalar_index_search[clean-in_list_three] 1,158.0684 (714.75) 1,239.1014 (507.19) 1,207.8667 (693.33) 34.6018 (287.85) 1,224.0058 (711.39) 54.9572 (605.33) 1;0 0.8279 (0.00) 5 1
test_scalar_index_search[with_new_rows-not_equality] 1,167.2066 (720.39) 1,359.3954 (556.43) 1,254.0329 (719.84) 89.0691 (740.95) 1,213.8336 (705.48) 160.5098 (>1000.0) 1;0 0.7974 (0.00) 5 1
test_scalar_index_search[with_new_rows-not_in_list_one] 1,173.1351 (724.05) 1,372.3914 (561.75) 1,284.1288 (737.11) 76.9859 (640.43) 1,275.4549 (741.29) 108.1893 (>1000.0) 2;0 0.7787 (0.00) 5 1
test_scalar_index_search[with_new_rows-in_list_three] 1,183.4061 (730.39) 1,386.9354 (567.70) 1,251.1625 (718.19) 78.5276 (653.26) 1,231.9633 (716.02) 60.4255 (665.56) 1;1 0.7993 (0.00) 5 1
test_scalar_index_search[with_new_rows-not_equality_and_chain] 1,193.9693 (736.91) 1,357.6298 (555.71) 1,266.2167 (726.83) 71.5113 (594.89) 1,227.2801 (713.30) 117.1642 (>1000.0) 2;0 0.7898 (0.00) 5 1
test_scalar_index_search[with_delete_files-not_equality] 2,876.2944 (>1000.0) 3,071.6768 (>1000.0) 2,968.8758 (>1000.0) 86.7243 (721.45) 2,970.9656 (>1000.0) 159.4035 (>1000.0) 2;0 0.3368 (0.00) 5 1
test_scalar_index_search[with_delete_files-not_in_list_one] 2,895.8059 (>1000.0) 3,083.3901 (>1000.0) 3,011.1086 (>1000.0) 69.3674 (577.06) 3,024.4056 (>1000.0) 57.9921 (638.76) 2;1 0.3321 (0.00) 5 1
test_scalar_index_search[with_delete_files-not_equality_and_chain] 2,910.2556 (>1000.0) 3,158.7741 (>1000.0) 3,035.7476 (>1000.0) 95.1737 (791.73) 3,062.0933 (>1000.0) 132.2388 (>1000.0) 2;0 0.3294 (0.00) 5 1
test_scalar_index_search[with_delete_files-in_list_three] 2,914.8616 (>1000.0) 3,195.7641 (>1000.0) 3,038.2834 (>1000.0) 109.5526 (911.35) 3,001.0621 (>1000.0) 156.2868 (>1000.0) 2;0 0.3291 (0.00) 5 1
test_scalar_index_search[with_new_rows-greater_than_not_selective] 2,953.6200 (>1000.0) 3,191.9999 (>1000.0) 3,023.4207 (>1000.0) 96.0136 (798.72) 2,997.4851 (>1000.0) 76.6214 (843.95) 1;1 0.3308 (0.00) 5 1
test_scalar_index_search[clean-greater_than_not_selective] 2,962.4572 (>1000.0) 3,107.2330 (>1000.0) 3,019.5229 (>1000.0) 63.1493 (525.33) 2,996.0779 (>1000.0) 106.7595 (>1000.0) 1;0 0.3312 (0.00) 5 1
test_scalar_index_search[with_delete_files-greater_than_not_selective] 4,597.3313 (>1000.0) 4,869.1265 (>1000.0) 4,762.7918 (>1000.0) 108.6766 (904.06) 4,784.1855 (>1000.0) 159.1199 (>1000.0) 1;0 0.2100 (0.00) 5 1
Note that clean-not_equality
is 2x slower than a full scan.
Given the query
SELECT * FROM ... WHERE id = 7
, assuming a scalar index onid
, we generate a small allow_list, materialize it, and feed it totake
. This works well.Given the query
SELECT * FROM ... WHERE id != 7
, we do the same thing, this does not work well.Instead, we should do a full scan, then apply a filter to remove the rows that are in the block list (or just don't use the index at all)
Note: this only applies to queries WITHOUT a vector search. When there is a vector index we don't need to materialize the scalar index results and so the rationale is completely different.