feat!: introduce `borsh::io` with either items of `std:io` or private `borsh::nostd_io` module reexported (`std` or `no_std`)

near / borsh-rs

Rust implementation of Binary Object Representation Serializer for Hashing

https://borsh.io/

Apache License 2.0

299 stars 65 forks source link

feat!: introduce `borsh::io` with either items of `std:io` or private `borsh::nostd_io` module reexported (`std` or `no_std`) #212

Closed mina86 closed 12 months ago

mina86 commented 12 months ago

When built with std feature, borsh uses a handful of std::io types in its public API (e.g. std::io::Write is used in BorshSerialize trait). When std feature is disabled, borsh switches to its own types which mimics behaviour of standard types offered through borsh::nostd_io.

Problem is that this approach results in no consistent way to name Write, Read etc. symbols used in the public API. This creates a problem for authors and users of no_std libraries. A no_std library might have code such as:

impl borsh::BorshSerialize for Foo {
    fn serialize<W: borsh::nostd_io::Write>(
        &self, writer: &mut W,
    ) -> borsh::maybestd::io::Result<()> {
        todo!()
    }
}

So long as borsh is built with default features disabled it will work correctly. However, if author of a std application enables borsh’s std feature, the aforementioned example crate will fail to build since borsh::nostd_io will no longer be offered.

Address this by introducing borsh::io module which exports Error, ErrorKind, Read, Result and Write symbols, i.e. the types which are used in borsh’s public API.

With this change, author of a no_std library may write:

impl borsh::BorshSerialize for Foo {
    fn serialize<W: borsh::io::Write>(
        &self, writer: &mut W,
    ) -> borsh::maybestd::io::Result<()> {
        todo!()
    }
}

and their code will work without problems when used in std applications.

mina86 commented 12 months ago

The crux of the change is in borsh/src/lib.rs. The rest is just s/__private::maybestd/__private/.

mina86 commented 12 months ago

We don't want to turn borsh-rs into maybestd provider.

I mean, borsh devs should have thought about that when they decided to use io::Read, io::Write and io::Error as part of borsh’s public API. For comparison, serde does offer StdError which switches between std::io::Error and its own trait depending on Cargo features. If you don’t want to offer maybestd you have an option of switching the interface to not use std.

Not offering maybestd is not only hostile to users of borsh but also to all users of no_std-compatible libraries which use borsh.

consider using use borsh::__private::maybestd; - it is still public, but from a borsh crate point of view, it is a private implementation that we don't want to provide backward compatibility guaranties.

But you are providing compatibility guarantees to borsh::nostd_io, correct? So you’re already paying 99% of the cost of supporting borsh::maybestd. Adding borsh::maybestd::io as re-export of borsh::nostd_io or parts of std::io is a rather tame requirement.

frol commented 12 months ago

But you are providing compatibility guarantees to borsh::nostd_io, correct?

That scope is small and self-contained in a single file, and that is indeed part of the public interface, as you stated. maybestd does not need to be part of the public interface, hence the difference.

Not offering maybestd is not only hostile to users of borsh but also to all users of no_std-compatible libraries which use borsh.

no_std-compatible libraries have two valid options:

Treat themselves as "friend" modules (in C++ terms), and use borsh::__private::maybestd wherever applicable (borsh-derive does that), knowing that they need to pay attention to __private changes
Write a minimal maybestd implementation themselves, as you said, it will be just a re-export

We optimize borsh for application developers, and we don't want them to abuse borsh::maybestd as they will later come and request more re-exports there.

mina86 commented 12 months ago

That scope is small and self-contained in a single file,

Just like what this PR proposes:

pub mod maybestd {
    pub mod io {
        #[cfg(feature = "std")]
        pub use std::io::{Error, ErrorKind, Read, Result, Write};

        #[cfg(not(feature = "std"))]
        pub use crate::nostd_io::*;
    }
}

Literally five symbols which are part of borsh’s public API. If you don’t like the name maybestd I’d also be happy with:

#[cfg(feature = "std")]
pub mod io {
    pub use std::io::{Error, ErrorKind, Read, Result, Write};
}

#[cfg(not(feature = "std"))]
pub use crate::nostd_io as io;

That would offer types borsh::io::Read etc.

Treat themselves as "friend" modules (in C++ terms), and use borsh::private::maybestd wherever applicable (borsh-derive does that), knowing that they need to pay attention to private changes

What’s the point of borsh::nostd_io then? At the moment it’s a foot-gun breaking people’s builds. Consider the following pascal-str crate:

[package]
name = "pascal-str"
version = "0.1.0"
edition = "2021"

[dependencies]
borsh = { version = "1.0.0-alpha.4", default-features = false }

#![no_std]
extern crate alloc;

struct PascalString(alloc::vec::Vec<u8>);

impl borsh::BorshSerialize for PascalString {
    fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
        writer.write_all(&[self.0.len() as u8])?;
        writer.write_all(&self.0)
    }
}

It compiles and works fine. But let’s now say someone uses it in std application like so:

[package]
name = "foo"
version = "0.1.0"
edition = "2021"

[dependencies]
borsh = "1.0.0-alpha.4"
pascal-str = { path = "../pascal-str" }

And suddenly the code breaks:

error[E0433]: failed to resolve: could not find `nostd_io` in `borsh`
 --> /tmp/b/pascal-str/src/lib.rs:7:28
  |
7 |     fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
  |                            ^^^^^^^^ could not find `nostd_io` in `borsh`

error[E0433]: failed to resolve: could not find `nostd_io` in `borsh`
 --> /tmp/b/pascal-str/src/lib.rs:7:78
  |
7 |     fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
  |                                                                              ^^^^^^^^ could not find `nostd_io` in `borsh`

Enabling a cargo feature shouldn’t break builds.

We optimize borsh for application developer

Except you don’t. Because now author of pascal-str has to introduce a std feature to their crate and application developer has to enable it. This is an objectively worse experience for application developer than what they had with borsh 0.10.

Write a minimal maybestd implementation themselves, as you said, it will be just a re-export

But as shown, it’s not just a matter of a simple maybestd module with re-exports. I have no way to refer to Write, Read, Error and ErrorKind symbols used by borsh. I have to have user of my crate tell me how borsh was compiled.

and we don't want them to abuse borsh::maybestd as they will later come and request more re-exports there.

If they do, tell them to GTFO. However, if something is part of borsh’s public API that should be provided by borsh in a way which doesn’t depend on whether std feature is enabled or not.

frol commented 12 months ago

Because now author of pascal-str has to introduce a std feature to their crate and application developer has to enable it. This is an objectively worse experience for application developer than what they had with borsh 0.10.

Well, pascal-str might want to have std feature anyway to enable it in borsh, but I see your point now.

But as shown, it’s not just a matter of a simple maybestd module with re-exports. I have no way to refer to Write, Read, Error and ErrorKind symbols used by borsh. I have to have user of my crate tell me how borsh was compiled.

This is a fair point. @mina86 @dj8yfo @matklad I would like to consider having borsh::maybestd_io or borsh::io module to be part of the public API. Still, I don't want to expose the whole maybestd. What do you think?

dj8yfo commented 12 months ago

Well, pascal-str might want to have std feature anyway to enable it in borsh...

Agree.

I have no way to refer to Write, Read, Error and ErrorKind symbols used by borsh.

Using this recipe https://github.com/near/borsh-rs/blob/09c9295000ced1933e067d094a3cd8fe5507044d/borsh/tests/test_ser_de_with.rs#L22-L26 should help. Doing the dispatch on the client lib side is more explicit about what's happening and i like it more.

like to consider having borsh::maybestd_io or borsh::io module to be part of the public API

This might result in people mindlessly using borsh::io instead of std::io even in the spots completely unrelated to borsh (IDE completion somehow adversely affects minds of folks).

Here's an example https://github.com/near/nearcore/pull/9432/files#diff-5610719ce9d54675a9153a04627f8988705312f7dec97375fa702283ed796c1aL4 and there're a few more.

This pr has most likely protected from cases when items unrelated to borsh::io might be used by only reexporting the relevant ones https://github.com/mina86/borsh-rs/blob/2e942c8c88611804db699237d6b0d976da40d3c4/borsh/src/lib.rs#L127-L130 . But it doesn't protect from cases when borsh::io is used in wrong context, where std::io was meant instead.

Supposedly, the recipe above addresses this too. We've already adjusted the plan of #51 with making schema -> unstable__schema due to additions by @mina86. Hopefully, he can sustain some backpressure in client libraries too.

mina86 commented 12 months ago

Build breaks when std feature is toggled. This is a bug.
There’s no way to name Read, Write, Result, Error and ErrorKind types used by borsh thus making life of library developers harder. This is bad API design and user-hostile.
Library developers are forced to introduce std feature which makes life of application developers harder since they now have to enable that feature on libraries which otherwise wouldn’t have it. This is user-hostile and contradicts frol’s statement that borsh is optimised for application developers.

The fact that pascal-str might want a std feature anyway is not an argument. First of all, there are many possible crates which won’t have std feature. Specifically the hypothetical pascal-str crate would probably not have that feature since it would be a wrapper around alloc::string::String. Second of all, there’s a bug in borsh. I’ve presented steps to reproduce it. And your answer to it: well, just don’t do it. That’s not a valid response to a bug report.

Unless there’s some new convincing argument, I must conclude that borsh 1.x is in state of perpetual brokenness poorly suited for no_std environments and that I should stick to borsh 0.10.

dj8yfo commented 12 months ago

Well, i've found an example pascal_string crate/repo. It doesn't have no_std at all.

Can you post a link to the problematic repo?

Your example

#![no_std]
extern crate alloc;

struct PascalString(alloc::vec::Vec<u8>);

impl borsh::BorshSerialize for PascalString {
    fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
        writer.write_all(&[self.0.len() as u8])?;
        writer.write_all(&self.0)
    }
}

attempts to assert, that the pascal-str doesn't use anything out of std , that's why it can work in both std and no_std.

But let’s now say someone uses it in std application like so

Is it supposed to still be using borsh::nostd_io in an std application?

If so, it can be helped by removing just this line https://github.com/near/borsh-rs/blob/09c9295000ced1933e067d094a3cd8fe5507044d/borsh/src/lib.rs#L100 , then borsh will still use either std::io or self::nostd_io depending on flag, but provide borsh::nostd_io outside in both situations. But then, what is the reason to support this, if pascal-str won't be compatible with other types from borsh in an std application?

matklad commented 12 months ago

Imo, this is what we need (haven't checked this specific code, might need some adjustments):

#[cfg(feature = "std")]      use std::io as io_impl;
#[cfg(not(feature = "std"))] mod nostd_io;
#[cfg(not(feature = "std"))] use nostd_io as io_impl;

pub mod io {
    pub use crate::io_impl::{Error, ErrorKind, Read, Result, Write};
}

That is:

given that borsh is io-polymorphic, clearly it should provide users with the ability to write io-polymorphic code as well.
borsh::io feels like the right name here. This is not the general "maybestd", io::Error is the only thing we need here (and we should commit to not needing anything else from the std)
we should enforce that the interface between two io modules is compatible, using a single pub use feels like a good way to do that.
it's unclear whether borsh::io should be absolutely minimal and strictly restricted to what's needed by borsh, or whether it should be a duck-type replacement for std::io. Eg, it unclear whether borsh::io should contain a version of std::io::copy -- some user code might want to call borsh::io::copy and have that work in std and non-std. I think it makes sense to expose what's already exposed. That is, neither to be absolutely minimal, nor to try to expose everything.

core::io feels like a common problem which should have solutions in the ecosystem, but quick googling didn't reveal any existing patterns. I think it is plausible that in the distant future, std::io::Error is moved to core, or some canonical core_io crate emerges. We should be able to take advantage of those by adding a new feature flag, so there's no compatibility hazard here.

In terms of diff for the current PR, rename&flatten borsh::maybestd::io to just borsh::io.

dj8yfo commented 12 months ago

The only downside of using borsh::io is that inattentive folks can randomly use items from this completion instead of std::io.

@mina86 has so far ignored the question to the pascal-str example, which was the cornerstone reason, backing the push for the change done by this pr:

tmp

mina86 commented 12 months ago

@mina86 has so far ignored the https://github.com/near/borsh-rs/pull/212#issuecomment-1712513408 to the pascal-str example, which was the cornerstone reason, backing the push for the change done by this pr:

Because it wasn’t a cornerstone reason. It was a quick example I came up with on the spot. The cornerstone reason was that there was no consistent way to name types used in the API which in turn resulted in build breaking when std feature was enabled.

I don’t care if it’s borsh::io, borsh::maybestd::io or even borsh::byćmożestd::io. So long as the same path works regardless of what features are enabled. And once that name is provided, pascal-str is supposed to use that.

dj8yfo commented 12 months ago

@mina86 well, alternatively, u can use 4 lines to dispatch on an std feature, or byćmożestd feature or whatever.

 #[cfg(feature = "std")] 
 use std::io; 

 #[cfg(not(feature = "std"))] 
 use borsh::nostd_io as io;

It's explicit and prevents you from importing borsh::io all over the place, whether you need it or not. std is important enough to be worth a line in Cargo.toml, and, as far as i understood, the pascal-str did intend to use two different providers of io, while using borsh, depending on whether it's std or not.

5 lines in client with explicit stating of what's used when and what's not vs 500 lines change in the subj repo is a good tradeoff imo.

mina86 commented 12 months ago

std is important enough to be worth a line in Cargo.toml

No, it’s not. I’m writing a no_std library. I don’t care about std. I wrote code which works with borsh with default features disabled. And then, when someone uses my library, the code breaks because borsh’s std feature was enabled. That a bug in borsh.

as far as i understood, the pascal-str did intend to use two different providers of io, while using borsh, depending on whether it's std or not.

Yes, depending on whether borsh is std or not. And since borsh changes types depending on whether it’s std or not, it’s borsh’ responsibility to provide a stable API for types it uses.

Imagine borsh had a uk feature and when it’s enabled rather than BorshSerialise and BorshDeserialise it would define BorshSerialize and BorshDeserialize. That would be extremely bad API. Situation with Read, Write etc. is the same.

5 lines in client with explicit stating of what's used when and what's not vs 500 lines change in the subj repo is a good tradeoff imo.

I can make this a ‘5 line’ change in borsh if you prefer:

+#[cfg(feature = "std")]
+pub mod io {
+    pub use std::io::{Read, Write, Error, ErrorKind, Result};
+}
+
+#[cfg(not(feature = "std"))]
+pub use nostd_io as io;

I made it 500 to simplify other things in the code. This isn’t a valid argument.

Not to mention that even if this is a 5 lines in client library, it’s 5 lines in every library and additional changes in every crate that transitively depends on it.