Open thedmm opened 10 months ago
Thanks for reaching out!
This crate focuses (in this order) on: correctness, user experience, generality, performance. So SIMD implementations if they only target specific encodings would favor performance over generality, which I'd like to avoid. The way I see there would be a few options:
data-encoding-simd
proc-macro (or it could be part of data-encoding-macro
too) to create SIMD-optimized encodings given their specification. There's 2 sub-options: the generated type is the same regardless of the specification (which may have a performance cost) or is unique per specification. I'm not sure I would have time to look into this, so that would probably not be out quickly. But that would be my preferred option (in particular if I can also provide constant-time implementations at the same time).simd
module in data-encoding
with specific implementation for specific but common specifications. This would be a good way to get something out while the first option above is being worked on. That's my preferred practical solution because I don't expect the first solution to see the light of day any time soon (unless you're willing to attempt it).vb64
for base64, although that one is new and apparently not meant to be maintained) which could be referenced from data-encoding documentation for those looking for performance, assuming that crate would be maintained.What do you think? What's the time frame you would like this to be out? And how much effort are you willing to invest?
Unfortunately, I think the goals I have in mind don't align with the goals for data-encoding
.
I don't think a proc-macro can deliver the kind of efficiency that I'm hoping to achieve with a manual implementation. For example, the index-to-character translation for standard Base32 is far easier than a table lookup - just translate 0 .. 26
to A .. Z
and 26 .. 32
to 2 .. 8
. For something like AVX2, where there's no efficient wide byte permutation available, this is much easier to deal with. Since data-encoding
defines specifications using full tables, deriving good translation functions is not going to be easy.
While I think that a simd
module with format-specific implementations could achieve the efficiency gains I want, I think it would end up outweighing the rest of the library in terms of complexity and maintenance cost. Besides, I'd like to experiment with a different API which is more amenable to my specific use-case with respect to streaming.
I think I'm going to write my own crate which doesn't try to offer the same kind of flexibility that data-encoding
can. I'll be able to only work on the formats I need, and provide gradual optimization where I can freely take advantage of SIMD features and more recent Rust features.
I love doing these kinds of performance optimizations, so I'm willing to sink a fair chunk of time into this. But given how specific my goals are, I think writing my own crate makes the most sense.
Thanks for creating and maintaining this crate!
I don't think a proc-macro can deliver the kind of efficiency that I'm hoping to achieve with a manual implementation.
I don't see any theoretical reason it couldn't, but I can see how it seems uncertain (and may need some work to actually get it to work). I'll look into it if I ever get time. I think it's interesting.
I think writing my own crate makes the most sense.
Sounds good! Please send me a link to your crate once it's out. I'm definitely going to take a look at it for both:
Just an update here. I have a basic design that I think works, but that's too much work for me to finish in the medium term. Maybe I'll pick this back up next year. Keeping open and renaming the issue.
This crate is currently used by
uutils/coreutils
for programs likebasenc
,base32
, andbase64
, and I'm looking for ways to optimize them. I've experimented with an AVX2 implementation of simple Base32 encoding and the results were promising - to this end, I'm looking to write optimized SIMD implementations of encoding and decoding for individual format specifications. Is this somethingdata-encoding
can support, given its current API? Would such implementation (and possibly API) changes be welcome? Or should I write my own crate instead?