ByteSubstring: add an option to used an owned needle (not borrowed)

tsnoam commented 2 years ago

This PR is a suggestion. I have done a minimal implementation, just so we have something to discuss.

The problem I was facing was that I couldn't use a borrowed ByteSubstring. So, I've modified jetscii's code to accept a Cow object.

Let me know what you think and if there're are any more modifications you would like to see as part of this PR.

tsnoam commented 2 years ago

@shepmaster kind ping on that.

i would be happy to rebase on current main but I would like to know if you'd be interested in this change

shepmaster commented 2 years ago

Oh, wow, I'm sorry that this totally dropped off my radar!

I think the idea has merit, but I think I would suggest a different implementation.

Right now, we have crate::ByteSubstring which delegates to crate::fallback::ByteSubstring and crate::simd::ByteSubstring. In order to support your goal, I'd suggest:

refactoring commit
- Stop storing the needle in {fallback,simd}::ByteSubstring, instead passing it into find.
- Start storing the needle in crate::ByteSubstring instead.
feature commit
- Add a new crate::OwnedByteSubstring that holds a Vec<u8>
- This calls into {fallback,simd}::ByteSubstring::find.

This way, you can choose if you need ownership or just a reference.

Does that make sense and seem plausible?

shepmaster commented 2 years ago

After musing on it for a while I wonder if we could change ByteSubstring to be generic with T: AsRef<[u8]>.

tsnoam commented 2 years ago

Hi @shepmaster Sorry for the late response. Yes, I like your suggestion. I can contribute these changes and obviously base it on the latest "main" branch.

shepmaster commented 2 years ago

Sorry for the late response

i think you’ve got a few months longer before you need to start apologizing for anything like that 😅

tsnoam commented 2 years ago

@shepmaster I've updated the PR in accordance with our discussion:

ByteSubstring takes T: AsRef<[u8]
I also added what I thought was an appropriate way to do the same thing for Substring, without breaking ergonomics or backward compat

let me know what you think.

tsnoam commented 2 years ago

I just realized that I only compiled it on my macos m1. So only the fallback code was tested 🤦‍♂️

I will fix and update here...

tsnoam commented 2 years ago

Ok. Fixed.

@shepmaster would really love to see your comments.

tsnoam commented 2 years ago

@shepmaster kind ping on that

tsnoam commented 1 year ago

@shepmaster kind ping :)

shepmaster commented 1 year ago

I did one force-push to tidy up some formatting things, and a second force-push to rebase on the current main branch (which has some soundness fixes).

Overall, things seem reasonable, but I'm a bit hesitant about the dispatch macro vs enum solution. The new code has to check the enum discriminant each call (e.g. to find). The old code would use the feature test, which I believe would do some trickery to perform the test just once and then effectively set a global value that could be checked in the future. I'll need to do some performance tests.

Have you tried this code locally with a String? Does it work for your original case?

tsnoam commented 1 year ago

Hi @shepmaster

Thanks for the review.

ACK for the formatting and rebasing over master.
Basically, I wanted to hear your initial feedback about the code before working harder to make things perfect ;)
However, I did not realize that unitests should be executed with all features turned on (I did that and fixed the Substring code).
I did notice that Substring might have backward-incompatible type definition. I'm not 100% sure that this is the way to go, but I think it is quite clean and obvious, will be glad to know what you think of it.
Regarding the dispatch macro: there is no indication in rust documentation that this is cached in any way ( https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html ), so I decided to check it out - first by benchmarks (I couldn't find any significant difference in either direction) but then I disassembled the compiled executable (using objdump -d). The new approach was shorter in about 50% for the ByteSubstring::find code.

Bottom line, I fixed the code a little more (now for all features), benchmarks are running properly - and obviously the new code was forced pushed into the branch.

shepmaster commented 1 year ago

I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!

 name                                         before ns/iter         after ns/iter          diff ns/iter  diff %  speedup 
 bench::space_ascii_chars                     356,001 (14727 MB/s)   362,121 (14478 MB/s)          6,120   1.72%   x 0.98 
 bench::space_stdlib_find_char                279,225 (18776 MB/s)   278,723 (18810 MB/s)           -502  -0.18%   x 1.00 
 bench::space_stdlib_find_char_set            2,608,215 (2010 MB/s)  2,626,160 (1996 MB/s)        17,945   0.69%   x 0.99 
 bench::space_stdlib_find_closure             2,318,368 (2261 MB/s)  2,390,488 (2193 MB/s)        72,120   3.11%   x 0.97 
 bench::space_stdlib_find_string              2,153,673 (2434 MB/s)  2,268,436 (2311 MB/s)       114,763   5.33%   x 0.95 
 bench::space_stdlib_iterator_position        1,160,877 (4516 MB/s)  1,198,881 (4373 MB/s)        38,004   3.27%   x 0.97 
 bench::substring_stdlib_find                 463,832 (11303 MB/s)   469,920 (11156 MB/s)          6,088   1.31%   x 0.99 
 bench::substring_with_created_searcher       358,074 (14641 MB/s)   361,645 (14497 MB/s)          3,571   1.00%   x 0.99 
 bench::xml_delim_3_ascii_chars               361,004 (14523 MB/s)   362,989 (14443 MB/s)          1,985   0.55%   x 0.99 
 bench::xml_delim_3_stdlib_find_char_closure  3,516,929 (1490 MB/s)  3,459,551 (1515 MB/s)       -57,378  -1.63%   x 1.02 
 bench::xml_delim_3_stdlib_find_char_set      2,612,870 (2006 MB/s)  3,456,608 (1516 MB/s)       843,738  32.29%   x 0.76 
 bench::xml_delim_3_stdlib_iterator_position  1,131,135 (4635 MB/s)  1,155,312 (4538 MB/s)        24,177   2.14%   x 0.98 
 bench::xml_delim_5_ascii_chars               360,298 (14551 MB/s)   360,735 (14533 MB/s)            437   0.12%   x 1.00 
 bench::xml_delim_5_stdlib_find_char_closure  2,715,305 (1930 MB/s)  3,458,322 (1516 MB/s)       743,017  27.36%   x 0.79 
 bench::xml_delim_5_stdlib_find_char_set      3,449,795 (1519 MB/s)  3,473,528 (1509 MB/s)        23,733   0.69%   x 0.99 
 bench::xml_delim_5_stdlib_iterator_position  1,138,696 (4604 MB/s)  1,158,930 (4523 MB/s)        20,234   1.78%   x 0.98

shepmaster commented 1 year ago

might have backward-incompatible type definition

Yeah, I think that this change will warrant a new semver-incompatible version to be released; that's not a big deal IMO.

there is no indication in rust documentation that this is cached in any way

Yeah, I read through that before posting my comment (that's what led to the "I believe"). It must have been made up in my head.

tsnoam commented 1 year ago

I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!

That is to be expected. Benchmarks will usually show differences of +-3% depending on their nature and whatever the benchmarking server is doing at the moment.

What I usually did when I was running benchmarks using nightly-rust was to externally run them again and again and look at the avg/max/min of each individual benchmark.

If you would like, it is possible to migrate the benchmarks to use criterion, which saves you the hassle of what described above. There is also a nice tool called critcmp which helps in comparing the results of two different criterion executions.

tsnoam commented 1 year ago

oh, and criterion also runs with stable rust, which is a bonus...

tsnoam commented 1 year ago

sorry, i'm looking again at the stdlib difference, and it is weird. too high of a diff.

let me look into it a little more.

shepmaster / jetscii

ByteSubstring: add an option to used an owned needle (not borrowed) #50