Open mingzian opened 3 years ago
@mingzian Can you elaborate on the applications you have in mind?
@lemire absolutely! I have a few in mind actually. One relates to extracting/locating words within sentences. Oftentimes, it is wasteful to go over the entire sentences because one knows either which word index is needed (which nearly always coincides with white spaces plus index 0), or at least some heuristic of it (words is at first/second half of sentence, etc). Also, in the algorithms I deal with at work, knowing the char index of each white space helps us search for words within sentences much faster. Another is sentence comparisons: if you can know efficiently the location of white spaces, you get a lightning first drop of non-matches.
I think that in any of those tasks, knowing the location of white spaces faster than the standard scalar looping over each char would be of significant improvement.
Thanks.
Amazing, amazing work!
Just thought of mentioning here that I would love to see this tweaked a bit such that, with the help of your simdprune library, one could find the indices of the white spaces chars within a given char string - instead of removing them. That would be of tremendous help in numerous string cleaning tasks.