nu11ptr / flexstr

A flexible, simple to use, immutable, clone-efficient String replacement for Rust
Apache License 2.0
148 stars 5 forks source link

Offer a boxed string by default? #3

Open epage opened 2 years ago

epage commented 2 years ago

It'd be interesting to see under what situations box might be useful.

nu11ptr commented 2 years ago

I played with a box but couldn't find a reason to add it. Allocation seemed about the same (I may need to revisit and check again in light of your benchmarks), but clone is slow given the allocation/copy. Also, this is kinda the antithesis to FlexStr which is about fast clones and not needlessly copying data.

nu11ptr commented 2 years ago

I also should add you can do this on your own. This was my example in the "make your own string" section. If you can give me a use case I'm not against adding necessarily either, but need to see a use case where it outperforms Arc

epage commented 2 years ago

I'd recommend reconsidering. The cost of a Box option is fairly low for maintenance / documentation. A big benefit is it'll help the user selecting the option that works for their benchmarks. I'm thinking of this from the perspective of re-evaluate what strings to use in toml_edit. Profiling is what led me to use small-string optimization. I'm going to be profiling what crate and crate settings work best for my application. I'll want to evaluate this option but having to make my own gets in the way of doing so and sours me a bit on the crate.

nu11ptr commented 2 years ago

So your use case is that you've found Box to allocate faster than Arc? Just want to understand the driver.

nu11ptr commented 2 years ago

The biggest issue I'm having is one of identity. Right now LocalStr and SharedStr are a pair and I plan to release optimizations to go back and forth. If I add a 3rd it would be "on the side" and not integrated into the overall scheme. Plus I'm trying to find the "why" - none of my benchmarks showed it made sense, so I'm looking for what Box does better. I want to reproduce your benchmarks because mine didn't show an advantage.

epage commented 2 years ago

I'm thinking back to when I first tried FlexStr in toml_edit and it was slower. That might have been other factors but I'll want to double check Box vs Arc to be sure.

Keep in mind, this is giving users the opportunity to evaluate against their own applications. I can't do that to show its faster without implementing it myself which I have no incentive to do, making this circular.

nu11ptr commented 2 years ago

FlexStr 0.8.0 had alignment issues due to the enum. That made it's performance inconsistent. Mostly it seems in inline strings but I can't seem to nail it down. 0.9.x doesn't have that issue due to the union - it is very fast for many ops (and has consistent micro bms)

epage commented 2 years ago

Yes, that could be it but it introduced enough doubt. In part, I filed all of these issues thinking about Josh's comment about what I would suggest. I suspect FlexStr is getting close to one I might generally suggest but I have various doubts or use cases so I filed these.

nu11ptr commented 2 years ago

I appreciate that. I'm interested in getting use and willing to do the work, so let me know what you need. I argue because I want to understand the 'why'. I usually get there even if not right away. :-)

epage commented 2 years ago

I definitely agree about seeking the "why"

Another one I just remembered on top of the previous ones I gave, when I pulled kstring into toml_edit, I benchmarked the various feature flags to see what would get the best performance. In my test cases, Arc was slower and I didn't enable it.

See https://github.com/ordian/toml_edit/blob/master/Cargo.toml#L44

nu11ptr commented 2 years ago

I would be interested to hear how you used it. In my benchmarks Arc and Rc are slightly slower than String on allocation (like 5-10% tops) and are much faster on clone. The only thing I didn't test yet (and it is important) is contended cloning of Arc in multiple threads. Real world might be different of course and sounds like it might be.

epage commented 2 years ago

toml_edits easy API is a fairly low-clone situation. You basically allocate the memory to then turn around and give it back to the user in their preferred data type via serde (we care more about the performance of this API than the editing API).

Yes, the focus of flexstr is on the cloning case but adding box support would make it more universally usable. There would be less need to have a user pick and choose which crate to use in which situation.

nu11ptr commented 2 years ago

Fair points - really it is as easy as me adding this:

type BoxStr = FlexStrBase<Box<str>>;