Closed djc closed 1 year ago
First off, thanks for filing feedback. I released the 0.20 alpha precisely to gather feedback, but that didn't produce any... oh well. If 0.20's role is to be the lightning rod, that's ok!
It is a bit baffling, though, to read what appears to be complaining about changing 12 callsites in a codebase. That is not a heavy burden. I don't really understand complaints about module paths, either. You can use
them with whatever amount of prefix you want, and if your editor doesn't automatically do that for you, get one that does...? I do regret shipping with the FastPortable
name, though -- I intended to change that before release but it slipped my mind.
I hadn't moved things to be methods on Engine to avoid perturbing the API too much, but I'm not opposed to it.
I went back and forth during development about if FastPortable
's config should include an Alphabet
, actually, and I think it's viable either way. However, if the goal is to have one thing that joins a Config
with an Alphabet
, isn't Engine
that thing? You create it ahead of time and re-use it. It encapsulates all the behavior in one type.
My intent with the jump from 0.13 to 0.20 was to signal the user that it's a major change in the API, which it obviously is given you filing this issue. ;)
It is a bit baffling, though, to read what appears to be complaining about changing 12 callsites in a codebase. That is not a heavy burden.
Right now in most of my usage sites I don't need to import anything, which is an aspect of the previous API I appreciated. The names FastPortable
, Config
and Alphabet
are opaque enough that I'd probably not want to import them as is, but would rename them (since they're too deeply nested in the crate's public API to use prefixed). Compared to the previous API, you've just added a bunch of boilerplate required to make calls to your API.
You create it ahead of time and re-use it.
My 12 uses are in 7 files across 4 different crates, so reusing it would need a bunch of different infrastructure.
My intent with the jump from 0.13 to 0.20 was to signal the user that it's a major change in the API, which it obviously is given you filing this issue. ;)
A one number minor version bump is used by every other 0.x crate out there, so it's a well-established standard. I don't recall any other crate having done a more-than-one minor bump.
If you want to get early feedback on new API, I'd recommend instead adding it in a compatible release and deprecating the old API.
added a bunch of boilerplate required to make calls to your API
If "not needing to import anything" is how you judge your libraries, I think we are not going to see eye to eye. The language gives you the ability to rename types at use-time, so while I'm open to suggestions for better names, in the meantime I encourage to make use of that ability.
reusing it would need a bunch of different infrastructure.
How is that any different than sharing a custom 0.13 Config
across crates?
A one number minor version bump is used by every other 0.x crate out there, so it's a well-established standard. I don't recall any other crate having done a more-than-one minor bump.
Complaining about incrementing a vaguely-meaningful number by 7 instead of 1 is making it hard to take you seriously.
How is that any different than sharing a custom 0.13
Config
across crates?
With 0.13 I didn't need a custom Config
, I just used the pre-defined configs that the API exposed.
If "not needing to import anything" is how you judge your libraries, I think we are not going to see eye to eye. The language gives you the ability to rename types at use-time, so while I'm open to suggestions for better names, in the meantime I encourage to make use of that ability.
I think what @djc is trying to say is that going from base64::decode_config(input, base64::STANDARD_NO_PAD)
to a multi line setup is not an improvement.
reusing it would need a bunch of different infrastructure.
How is that any different than sharing a custom 0.13
Config
across crates?
Because the Config
came with v0.13
A one number minor version bump is used by every other 0.x crate out there, so it's a well-established standard. I don't recall any other crate having done a more-than-one minor bump.
Complaining about incrementing a vaguely-meaningful number by 7 instead of 1 is making it hard to take you seriously.
The problem isn't really the 7 vs 1 jump, it's the breaking of the semver convention. When I saw the jump I though I missed a large number of releases.
With 0.13 I didn't need a custom
Config
, I just used the pre-defined configs that the API exposed.
That's not really viable now that there is an additional axis of choice. I want users to understand specifically what their API calls are doing.
That's not really viable now that there is an additional axis of choice. I want users to understand specifically what their API calls are doing.
I don't necessarily want to understand the inner workings of this crate if all I want is a base64 url-encoded unpadded string.
I think what @djc is trying to say is that going from base64::decode_config(input, base64::STANDARD_NO_PAD) to a multi line setup is not an improvement.
Lines of code is a poor metric for quality. I agree that all other things being equal, shorter is better, but all other things are not equal.
I don't necessarily want to understand the inner workings of this crate if all I want is a base64 url-encoded unpadded string.
Hyperbole much? You're welcome to provide concrete suggestions for different API.
That's not really viable now that there is an additional axis of choice. I want users to understand specifically what their API calls are doing.
What specifically do you think is so important that users understand? I would like to do url-safe base64 encoding/decoding without padding. Why do you want me to understand what Engine
I am using, when the only option beside the default engine is one that I supply myself? Even if you offer a SIMD-enabled engine in the future, making your users select one or the other (as presumably there will be trade-offs) is mostly an anti-pattern in terms of API design, ideally I'd be able to pick a config (and potentially input) and you decide for me which engine is best.
Hyperbole much? You're welcome to provide concrete suggestions for different API.
Given that FastPortable
has a const new()
anyway, why not provide some of the popular default options as a const DEFAULT: FastPortable
in the root of the crate, similar to what you had in 0.13?
Lines of code is a poor metric for quality. I agree that all other things being equal, shorter is better, but all other things are not equal.
I'm honestly not being adversarial, but I fail to see how things are not equal right now.
Hyperbole much?
I didn't mean to say this in a way that upsets you, and generally I appreciate you making this crate as versatile as possibly, allowing different engines and whatnot.
However, the way I've been using this crate is as a simple utility to create standard base64 encoded strings and url-safe base64 encoded strings. This was easy as all I had to do was calling encode or calling encode_config with the url-safe config you provided in this crate.
With the new API, this gets a lot more complex. I now need to create my own engine with the alphabet that I want, and a custom config that specifies whether I want padding or not. This makes the API more complex, harder to learn (as I now need to learn about the differences between engines, alphabet and config (and I'm not criticizing their existence)), and more verbose on my side.
You're welcome to provide concrete suggestions for different API.
I'd appreciate an API that lets me choose the alphabet and padding without creating my own engine. For example
fn decode_config(input: &str, alphabet: &Alphabet, padding: &Padding) -> Result<Vec<u8>>;
Alternatively, you could also add more pre-defined engines with popular combinations, like url-safe or unpadded.
Additionally, I suggest adding an example for url-safe base64 en/decoding to the documentation.
I am totally agree with the discussion above. For most users, they don't actually need to choose which Engine
the library were using to perform encode/decode, what they just want is an API that could input a string and then output a string, the library should choose the most effective way.
I understand the necessary of the abstraction of Engine
, but this is really not most users want. I don't believe there are many users want to customize:
Most users should want to use those standardized formats:
I prefer having an API that @msrd0 proposed.
Even if you offer a SIMD-enabled engine in the future, making your users select one or the other (as presumably there will be trade-offs) is mostly an anti-pattern in terms of API design, ideally I'd be able to pick a config (and potentially input) and you decide for me which engine is best.
I don't think this is true. Implicit defaults have been harmful in a number of circumstances because users don't realize there is a choice they have to make, and are implicitly making. Consider the case of default file charsets, a "convenience" in a number of language ecosystems. It's now clear that that was an error: reading a file should always result in the same code points regardless of OS, and having two ways of reading a file (one with the default, one with a specified charset) increases the cognitive load of learning the API and the chance of mistakenly using the "default" one. Time zones are another such case. Convenient, right up until they become a constant trickle of bugs as each developer who touches a codebase has to learn not to do the "easy" way with the default time zone.
When we have a constant-time engine (which I'll get to, unless someone beats me to it), it will likely not have as good performance as the current one, but for anyone touching cryptographic material, it's the engine they should use. In such a context, the error of accidentally using the convenient, but incorrect, method of decoding because the IDE happened to autocomplete it, etc is actually a nontrivial problem -- something I hope resonates, given the link to an issue in rustls/pemfile!
why not provide some of the popular default options as a const DEFAULT: FastPortable in the root of the crate, similar to what you had in 0.13?
I'm not necessarily opposed to it, but with the proliferation of wrinkles that base64 has, it's difficult to guess what users would want, especially when it's so easy for them to create their own tailored consts.
Additionally, I suggest adding an example for url-safe base64 en/decoding to the documentation.
Good idea, I will.
I don't believe there are many users want to customize [...] character set
Look at all the alphabets I've added over the years, all due to user requests, and the desire for custom alphabets as well.
I'd appreciate an API that lets me choose the alphabet and padding without creating my own engine: [...]
fn decode_config(input: &str, alphabet: &Alphabet, padding: &Padding) -> Result<Vec<u8>>;
If that's the API you want for your circumstance, isn't that literally a one line function you could write in your own project? I'm not saying it's a bad API for you, but I'm not convinced it's a good API for everyone. There's nothing wrong with tailoring an API to suit your purposes.
So far, what I'm planning on doing:
GeneralPurpose
? GenPurpose
? Default
? Clashes with the `Default trait, so that's unfortunate, and I already am not a big fan of "defaults", as described above.
pub use
for it and its config in base64::engine
to enable use as engine::Foo
Engine
Look at all the alphabets I've added over the years, all due to user requests, and the desire for custom alphabets as well.
Yes, I agree there must be users want to have customized alphabets, the current 0.20 API could fulfill their requests. But for most use cases, we only need to encode/decode standardized base64 formats!! We don't need any customization!
Standard: base64, base64_url with padding, base64_url without padding.
The current API design brings the complexity to all users just because a relatively small group of them want customization.
As for the implementation detail of Engine
, I still don't think mosts users want to "choose" a specific Engine
and they only care about:
So in those popular platforms, the library could set a SIMD Engine
as the default Engine, but for others, the FastPortable
is Ok.
I don't think this is true. Implicit defaults have been harmful in a number of circumstances because users don't realize there is a choice they have to make, and are implicitly making. Consider the case of default file charsets, a "convenience" in a number of language ecosystems. It's now clear that that was an error: reading a file should always result in the same code points regardless of OS, and having two ways of reading a file (one with the default, one with a specified charset) increases the cognitive load of learning the API and the chance of mistakenly using the "default" one. Time zones are another such case. Convenient, right up until they become a constant trickle of bugs as each developer who touches a codebase has to learn not to do the "easy" way with the default time zone.
Yes, implicit defaults can be harmful. But I don't think this applies for this particular case, because the choices here so far don't seem to be about correctness. As long as the default choice is correct, it won't be harmful. Rust doesn't make you pick the encoding for every string explicitly, right? It sets UTF-8 as a reasonable default and enables API that works around it. That feels more analogous to what the current API is doing.
I don't think constant-time implementations are a great comparison because by and large developers who touch constant-time sensitive code know that what they're doing is a special case and they need to be careful to get it right. It would be great to expose a ConstantTime
Engine
for them!
In my experience good API design will make the easy things simple and the hard things possible. The 0.20 API focuses very hard on the latter to the point that it basically doesn't do the former, at all.
Look at all the alphabets I've added over the years, all due to user requests, and the desire for custom alphabets as well.
That there are a number of different users asking for different alphabets does not mean users using a different alphabet constitute a majority of your users. I'm pretty confident that 80% or more of base64 users stick to the default and URL-safe alphabets.
something I hope resonates, given the link to an issue in rustls/pemfile!
rustls-pemfile doesn't actually need constant-time base64 -- it's only used in an input configuration path.
That there are a number of different users asking for different alphabets does not mean users using a different alphabet constitute a majority of your users. I'm pretty confident that 80% or more of base64 users stick to the default and URL-safe alphabets.
Yes! And I don't think there are 20% of users want to have customized alphabets base64. I believe there are no more than 1%. Convince me.
A quick solution: Just add some constants in every engine module.
pub const STANDARD: FastPortable = FastPortable::from(alphabet::STANDARD, PAD);
base64::decode_engine("aGVsbG8gd29ybGR+Cg==", &base64::engine::fast_portable::STANDARD)
Different base64 crates in Rust ecosystem:
Yes! And I don't think there are 20% of users want to have customized alphabets base64. I believe there are no more than 1%. Convince me.
While I agree that probably >90% of users don't need alphabets other than standard and url safe, I don't think you are the one desinging the API, so I think you should convince others, not ask to be convinced yourself.
I marked as off-topic. Lets come back to discuss the API.
A quick solution: Just add some constants in every engine module.
I think this is a pretty good idea. Perhaps even the Engine
trait could host them, though I would want to be more confident that all of the desired engine impls could support all of the desired configurations (standard, url safe, url safe without padding).
I think part of the disconnect here is that our goals aren't 100% aligned. You all want quick solutions to the problems in front of you, understandably. I want to create a structure that's going to age gracefully and not turn the API into confusing grab bag of different shortcuts that individually make certain use cases easier but together are an incoherent mess. I think there is a general theme here of mistaking "easy" for "simple". They are not the same. It is quite possible for a multi step process to be simple and an easy process to conceal or otherwise incur a complexity cost. If I included a shortcut method or function or constant for every individual's use case request, this library would have long since devolved into a heinous cross product of complexity. That would be a poor user experience, and much more work to maintain.
Take a look at https://github.com/marshallpierce/rust-base64/pull/207. I've released that as 0.21.0-beta.1
also to make it easy to try out.
Thanks for updating the API, it looks much better now! There is however one thing that I think could be improved:
In your new example, you use engine::STANDARD_NO_PAD.encode(orig)
which does not mention base64 anywhere. In the context of this libraries documentation, the example is clear and straight-forward. However, if I were to encounter this statement in some code on the internet, it'd take me more time than necessary to figure out what encoding this is using. I suggest you either re-export the general-purpouse constants from the root of the crate, so one can write base64::STANDARD_NO_PAD
, or rename them to be prefixed, e.g. BASE64_NO_PAD
or BASE64_URL_SAFE
. This way, the code that uses this library can be immediately identified to do base64 de/encoding.
The problem with having them at the root is that it can be difficult for, say, STANDARD
to tell if it's referring to the alphabet or the engine. Leaving the engine
prefix in removes the ambiguity.
The problem with having them at the root is that it can be difficult for, say, STANDARD to tell if it's referring to the alphabet or the engine. Leaving the engine prefix in removes the ambiguity.
Good point.
Another idea: Maybe you could add a prelude
package that re-exports the standard engine(s) in a way that makes it clear what you are doing? Like
pub mod prelude {
pub use crate::engine::{
Engine as Base64Engine,
STANDARD as BASE64,
STANDARD_NO_PAD as BASE64_NO_PAD,
// ...
};
}
That way, users could choose which naming scheme they prefer.
A prelude
is an interesting idea, but it also creates the problem of "dialects" that C++ suffers from (on a much grander scale): different projects (or worse yet, the same project in two places) could accomplish the same base64 task in different ways, adding to users' cognitive burden. Given that a project that might want different names already has the ability to use
whatever names they need, or make their own consts, I'd want pretty compelling evidence before introducing a "dialect"-prone solution like prelude
.
The problem with having them at the root is that it can be difficult for, say, STANDARD to tell if it's referring to the alphabet or the engine. Leaving the engine prefix in removes the ambiguity.
I think your issue with exporting the constants at the root of the crate also applies to the re-exports in engine. In my opinion, a consumer of the crate could easily interpret engine::STANDARD
to mean "the standard engine" or "the standard alphabet" instead of "the standard engine with the standard alphabet".
Personally, I think the best way to solve this currently is removing the re-export of generalpurpose in engine and renaming the constants to have `BASE64` as their prefix like @msrd0 suggested.
BASE64_
prefix for constants makes it locally obvious that the code is doing base64 encoding or decoding, without requiring the use of a long module path.I'm not sure I can do much to help a user who thinks engine::STANDARD
refers to the standard alphabet. Besides that, the argument also applies to a hypothetical BASE64_STANDARD
as well: why is that unambiguous to someone that engine::STANDARD
is ambiguous to? A really confused person could still think they both referred to alphabet. I also don't see how BASE64_
implies anything at all about what engine is used.
I think engine::STANDARD
reads well, but users can have any module path they like with a custom use
if confusion over whether or not base64 is happening is a problem in practice. I'm ok with removing the re-exports in engine
, though then you'd have either a very long callsite, or just STANDARD
, which didn't see as informative to the reader. I personally tend to leave a bit of the module path attached for any types that aren't native to the crate I'm working in, and engine::STANDARD
suits that aesthetic.
@marshallpierce I might have worded my ideas badly, sorry about that.
What I think could be confusing for users is, when seeing engine::STANDARD
, figuring out that this refers to the STANDARD
alphabet used in conjunction with the general_purpose
engine. If I only saw engine::STANDARD
, I might think that this only referred to "the standard/default engine" and that I still had to somehow specify the alphabet. Or, someone might understand engine::STANDARD
to mean "some engine using the standard alphabet" and be unsure which engine is used.
Concerning your last paragraph, I personally still think that my suggestions reads a bit better, even with your preference for keeping the module path attached.
Here are two examples for how I imagine the base64 crate would be used in different contexts.
use base64::{engine::general_purpose::BASE64_URL, Engine as _};
let orig = b"data";
let encoded: String = BASE64_URL.encode(orig);
Here, the user, when importing, has to make the choice to use the general_purpose
engine, but after that does no longer have to think about it in detail. However, any casual reader can immediately tell that base64 url encoding is happening without having to look for any non-local information.
use base64::{engine::{general_purpose, constant_time}, Engine as _};
let secret = b"c2VjcmV0";
let decoded: Vec<u8> = constant_time::BASE64_URL.decode(orig)?;
let orig = b"data";
let encoded: String = general_purpose::BASE64_URL.encode(orig);
Here it is also really important to always be able to tell which engine is used. By keeping the module prefix included, this is possible at a glance.
I personally tend to leave a bit of the module path attached for any types that aren't native to the crate I'm working in, and
engine::STANDARD
suits that aesthetic.
I generally never leave more than one module in the path, and always prefer to just use the type name if it is expressive enough (which STANDARD
is absolutely not, but neither is engine::STANDARD
, neither mention base64 anywhere). Also, I use rust-analyzer to auto-complete the type name for me, so using a renaming import in my own code is always extremely annoying.
While I'd still prefer an api like base64::decode_url_safe
, the API suggested by @leotaku in their last comment looks pretty good.
What I think could be confusing for users is, when seeing engine::STANDARD, figuring out that this refers to the STANDARD alphabet used in conjunction with the general_purpose engine.
I'm not sure I follow what user population it is that is both so knowledgeable about implementation details of this crate that there's a concern about which implementation of Engine
is used, yet also so unfamiliar with it that it doesn't know what engine::STANDARD
is.
I also don't think a BASE64 prefix is appropriate. The crate name is already there in the module path, and if people want BASE64
or any other prefix, they can do that with a custom use
, const
, etc. I don't understand the hesitation to adapt a library's API to tailor a particular use case. The goal is to express to readers of the API docs how to do what they need to do. It's up to them to come up with names, module paths, etc that make sense in their codebase at that point.
Also, I use rust-analyzer to auto-complete the type name for me, so using a renaming import in my own code is always extremely annoying.
If rust-analyzer doesn't support use Foo as Bar
and then autocompleting Bar
, that sounds like a rust-analyzer bug, not something to design a library's API around.
If rust-analyzer doesn't support use Foo as Bar and then autocompleting Bar, that sounds like a rust-analyzer bug, not something to design a library's API around.
When I manually scroll to the top of the file, manually type use Foo as Bar
, then manually scroll down to the position I was editing before, then rust-analyzer will auto-complete Bar.
However, I cannot type BASE64_STANDARD
in a file I haven't used it before, and have rust-analyzer automatically add a renaming use statement. If I would use the original name, I could start typing STANDARD
, and have rust-analyzer both auto-complete standard and add the use statement.
It's up to them to come up with names, module paths, etc that make sense in their codebase at that point.
Frankly, I don't think that STANDARD
in and by itself makes sense in any codebase that is not specific to base64. Worse, I think there might be codebases that already have a constant with that name that has a completely different purpose. And yes, you can rename them, but this is a tedious and manual process which I'd prefer to avoid.
But you can do const ANY_NAME_YOU_LIKE = base64::...
. and be done. You've spent enough words in this issue to have a lifetime supply of consts
several times over. You clearly have opinions about naming. That's ok! Apply them in your own projects how you see fit.
I am not optimizing for the convenience of rust-analyzer users working in their editor with its specific limitations. I am optimizing for the overall comprehensibility of the library. While I may well not have the optimal arrangement for that stated goal, "my editor makes this hard" is not a persuasive argument to change.
I would design the API like this.
use base64::Engine; // auto imported by rust-analyzer
let b1 = base64::gp::STANDARD.decode(s1)?;
let b2 = base64::ct::STANDARD.decode(s2)?;
let b3 = base64::simd::STANDARD.decode(s3)?;
The equivalent in upcoming base64-simd
v0.8 will be
let b1 = base64_simd::STANDARD.decode_to_vec(s1)?;
It does read well, but I don't think it's easy to understand necessarily. A reader who isn't in the know about engine implementations isn't going to get much from gp
or ct
. It also flattens the hierarchy, which could make navigation more troublesome for anyone who isn't simply copy and pasting examples.
After working on other things for a while and coming back to this with fresh-ish eyes, I think on balance @msrd0's prelude
idea is the one I "dislike the least". ;) I'll publish that as beta 2 momentarily.
Thanks for updating the API. I believe you should also re-export Engine
in the prelude module, as that import is required to use any of the encode/decode functions.
Ok. I'll add that and call it rc1
.
There's been no new feedback for a while so I'm closing this and releasing 0.21.0.
@marshallpierce the 0.21 API is substantially better in this regard than the 0.20 one, thanks for the iteration here.
I'm unhappy about the API changes in 0.20. In my code base at work we have 12 use cases of base64 0.13 encoding and decoding, 9 of which pass a config (mostly
URL_SAFE_NO_PAD
, someURL_SAFE
). Unless I misunderstand the new API (I don't see any examples of this use case in the README or the top-level documentation), in order to obtain the same result with 0.20 I would have to instantiate a customFastPortable
(which, despite being the only engine shipped with 0.20, I have to import frombase64::engine::fast_portable::FastPortable
) and initialize withbase64::alphabet::URL_SAFE
andbase64::engine::fast_portable::NO_PAD
.Instead of offering a simpler way to do this through a top-level function, 6 of the 9 top-level functions now take an
Engine
; I would argue that, in case you already have an engine, these could have been methods attached to it.I understand that the
Engine
abstraction is useful if you want to introduce a SIMD-backed implementation. I even like the introduction of a separateAlphabet
type. I think it doesn't really make sense to have aConfig
type that doesn't includeAlphabet
, keeping it separate instead -- the whole point of having aConfig
type is to make it easy to wrap all the contextual/usually stable input for your main API calls into a single type. But to not offer a top-level API that makes it easy to use different alphabets/config seems to me like a lack of empathy for how this crate is used.(FWIW, I also think jumping from 0.13 to 0.20 doesn't have much merit.)