rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.03k stars 871 forks source link

[FEA] explore using KMP for string matching like operations #15760

Open revans2 opened 1 month ago

revans2 commented 1 month ago

Is your feature request related to a problem? Please describe. https://dl.acm.org/doi/pdf/10.1007/s00778-015-0409-y shows some really great performance numbers for doing string matching on GPUs. It would be great if we could look into using it to speed up some string operations, like contains, or more generically the LIKE command for a literal pattern.

GregoryKimball commented 1 month ago

Hello @res-life, thank you for studying this item. Would you please first try to implement KMP for strings "LIKE"? We think that KMP could be useful for strings columns where the string length varies a lot (10-10K characters).