rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.9k stars 12.52k forks source link

Tracking issue for libcore + no_std stabilization #27701

Closed alexcrichton closed 8 years ago

alexcrichton commented 9 years ago

This issue is intended to represent the outstanding issues for stabilizing libcore and allowing its usage on stable Rust. There are a number of features currently associated with libcore:

(note that core_float will be handled in a separate issue)

The design of libcore largely mirrors that of the standard library (good) but there are a few deviations:

Overall there are a number of tasks that probably need to be done before stabilizing these items:

alexcrichton commented 9 years ago

I think that all the major issues here have been resolved, so I'm nominating this for stabilization in 1.5

bluss commented 9 years ago

Is it possible to support multiple impl primitive blocks, so that core doesn't have to use as many extension traits?

alexcrichton commented 9 years ago

Not currently that I'm aware of at least, and that would definitely be part of the stabilization aspect of this issue.

aturon commented 8 years ago

Idea: mark the methods on the extension traits as stable, but leave the traits themselves unstable. Works because they're in the prelude. (We did this for SliceConcatExt.)

Not a great permanent story, but it's a place to start.

brson commented 8 years ago

I want to understand the story for undefined symbols before making this stable. There's a real possibility of people defining these themselves in ways that aren't forward compatible.

alexcrichton commented 8 years ago

This issue/feature is now entering its cycle-long FCP for stabilization in 1.5


@brson, we currently have very few undefined symbols actually:

$ nm -u libcore*

core-10cbabc2.0.o:
                 U fmod
                 U fmodf
                 U memcmp
                 U memcpy
                 U memset
                 U __powidf2
                 U __powisf2
                 U rust_begin_unwind
                 U rust_eh_personality

Of these, rust_begin_unwind and rust_eh_personality are verified by the compiler to exist in the form of a lang item (e.g. it's unstable to define them) whenever you're producing a staticlib, executable, or dylib, so we're covered on that front. The memcmp, memcpy, and memset functions are lowered to by LLVM and I think it's safe to say that our reliance on their meaning will never cause breakage. The __powi{d,s}f2 function dependencies may disappear but otherwise they're provided by compiler-rt so I don't think we have to worry about them.

Otherwise, the only symbols we rely on are fmod and fmodf and that in turn is only because Div and Rem must be implemented in libcore instead of being able to define them in libstd for floats. I'm personally comfortable with this because --gc-sections makes the dependency go away and the symbols also have very well known meanings.

Do you have further concerns in light of this information?

alexcrichton commented 8 years ago

Oh and as a further clarification that I forgot to point out, this would also involve stabilizing #![no_std]. As a result I'm tagging this T-lang and renominating to make sure it comes up in triage over there. Triage in all the teams!

alexcrichton commented 8 years ago

And to further refine my comment, I remember now that we have more undefined symbols on 32-bit because some 64-bit integer operations frequently require support from compiler-rt. Taking a look at the undefined symbols from i686-unknown-linux-gnu:

         U __divdi3
         U fmod
         U fmodf
         U _GLOBAL_OFFSET_TABLE_
         U memcmp
         U memcpy
         U memset
         U __moddi3
         U __mulodi4
         U __powidf2
         U __powisf2
         U rust_begin_unwind
         U rust_eh_personality
         U __udivdi3
         U __umoddi3

It's basically the same story here, though, nothing unexpected and I don't think we should worry about people relying on their own incorrect versions of compiler-rt intrinsics.

briansmith commented 8 years ago

Some things I noticed when I worked on adapting ring to work with #![no_std] and libcore:

I don't get why I have to use core::foo instead of std::foo. I ended up having to write things like this:

// Re-export `{mem, ptr}` from either `std` or `core` as appropriate. This
// keeps the conditional logic on one place.
#[cfg(feature = "lib_no_std")]
pub use core::{mem, ptr};
#[cfg(not(feature = "lib_no_std"))]
pub use std::{mem, ptr};

Then I refer to, e.g. ring::ffi::mem::uninitialized() and ring::ffi::ptr::null() from all my other modules. But, it doesn't really make sense. I feel like it would be much better to just be able to use std::ffi::ptr::null() in #![no_std].

Also, it is confusing to me about whether #![no_std] applies to the entire crate or only to individual modules. Especially when building using Cargo, it isn't clear to me how it is useful to have a mix of #![no_std] modules within a single crate.

Finally, the documentation should be clearer about how to write a library that automatically works in both #![no_std]mode and non-#![no_std] mode depending on whether the executable is built with or without #![no_std]. Right now I have my library crate define a feature that controls whether or not I enable #![no_std] in the library, but it would be nice if it were automatic.

SimonSapin commented 8 years ago

@briansmith Why not only use no_std rather than have two modes? If you have optional features that require std, can’t you still unconditionally use use core::… for most of your imports?

alexcrichton commented 8 years ago

@briansmith I totally agree that we can bolster our docs in this respect! As @SimonSapin mentioned, there's actually no need to have a "dual mode" where sometimes you use std and the rest of the time you use core, the intention is for the crate to permanently mention #![no_std] and then everything "just works". In that sense, it sounds like a number of concerns you have would be addressed?

Part of the pain of using #![no_std] today is that if you want to work on both stable and nightly Rust you've got this whole import and duality problem, but that's why I'd like to stabilize #![no_std] :).

Additionally, #![no_std] is only a crate level attribute (which the docs should explain), so there's actually no notion of a "no_std module" vs another module as a whole crate should behave the same way.

Does that help explain things? I can try to bolster up our docs on this subject before we push this over the finish line!

briansmith commented 8 years ago

@briansmith I totally agree that we can bolster our docs in this respect! As @SimonSapin mentioned, there's actually no need to have a "dual mode" where sometimes you use std and the rest of the time you use core, the intention is for the crate to permanently mention #![no_std] and then everything "just works". In that sense, it sounds like a number of concerns you have would be addressed?

When I add this to my lib.rs:

#![feature(no_std)]
#![no_std]

I get:

src\digest.rs:28:5: 28:8 error: unresolved import `std::mem`. Maybe a missing `e
xtern crate std`? [E0432]
src\digest.rs:28 use std::mem;

src\aead.rs:27:5: 27:8 error: unresolved import `std`. There is no `std` in `???
` [E0432]
src\aead.rs:27 use std;

What am I doing wrong? The repo is at https://github.com/briansmith/ring, branch "no_std".

alexcrichton commented 8 years ago

Wow we really do need some documentation on this badly... The detailed design of #![no_std] can currently be found in an RFC, but the gist of it is that #![no_std] doesn't import std at the crate root or the std prelude in every module, but rather core at the crate root and the core prelude in every module.

If you're core-compatible you can basically rewrite all imports to be from core instead of std and you should be good to go!

briansmith commented 8 years ago

OK, if I use core:: instead of std:: it works. But if I remove #![no_std] then it stops working:

src\aead.rs:27:5: 27:9 error: unresolved import `core`. There is no `core` in `?
??` [E0432]
src\aead.rs:27 use core;
                   ^~~~
src\aead.rs:27:5: 27:9 help: run `rustc --explain E0432` to see a detailed expla
nation
src\digest.rs:28:5: 28:9 error: unresolved import `core::mem`. Maybe a missing `
extern crate core`? [E0432]/
src\digest.rs:28 use core::mem;

Why is core only implicitly imported when we use #![no_std]? Why not always implicitly import core like std is implicitly imported?

Also, all the documentation uses std::foo instead of core::foo. Are you planning to update the documentation to use core:: ubiquitously? It is confusing to have two sets of names for the same things.

briansmith commented 8 years ago

Also, let's say my module uses core:: everywhere, and it is imported into a program that uses std:: everywhere. Do I have to worry that there will be code duplication between the core:: and :std:: variants of features?

alexcrichton commented 8 years ago

Why is core only implicitly imported when we use #![no_std]?

At 1.0 we didn't inject the core library, and it would be a breaking change to do so now. Additionally I would personally not want it injected by default as std is "The Standard Library" which should be ubiquitously used instead of a mixture of core/std in my opinion. I would see it as a failure mode if a style guideline existed where in normal Rust code one should use core::foo for imports from core and std::foo for std-only imports.

Are you planning to update the documentation to use core:: ubiquitously?

Not currently, the documentation for #![no_std] would indicate that core is simply a subset of the standard library, so if a code example will work with only libcore it'll work by just rewriting std to core imports.

It is confusing to have two sets of names for the same things.

I think this confusion primarily stems from a lack of documentation. I think it's crucial to maintain the distinction of whether code is in a "core only" or "std only" context. It's important to know what abilities you have access to (e.g. can you do I/O?). Beyond that the structures are exactly the same, so once this piece of information is known I don't think it'll be too confusing (but it definitely needs to be explained!)

Do I have to worry that there will be code duplication between the core:: and :std:: variants of features?

That's actually the great part! Because the standard library simply reexports everything from core, a #![no_std] crate can still be used with any other crate seamlessly, there won't be any duplication of anything.

briansmith commented 8 years ago

I am surprised that it is so difficult to convey how confusing this is.

IMO, it would make a lot more sense to NOT have #![no_std] and instead have #![feature(no_std)] that disables everything in std:: that requires runtime support, and then deprecate core::. This would be more consistent with how other sets of features within std:: are enabled/disabled and would allow every library feature to have one, canonical, ubiquitous, name.

briansmith commented 8 years ago

Here is the development mode that I see for libcore compatible libraries on crates.io:

  1. Somebody makes a library that isn't libcore-compatible.
  2. Somebody else that needs the library to be libcore-compatible submits pull requests to make it libcore-compatible so they can use it in their no_std projects.

Step #2 will likely involve making some features of the third-party library conditional on whether std is being used or whether core is being used. Note that it isn't clear how how this conditional enablement of features should work. In particular, which #![cfg(feature = "???")] should be used? In my suggestion above, this would be #![cfg(feature = "no_std")].

Step #2 will also likely involve changing some std:: references to core:: references. According to the Drawbacks section of the RFC, this is likely to be an source-copmatible-breaking change in the library. For example, if the library exports macros that reference std:: and those changes are changed to core::. This would put the third-party library developers in a Catch-22: they either has to choose between backward-compatibility or libcore compatibility. In my suggestion above, this would not be an issue because there would be no separate core::.

SimonSapin commented 8 years ago

core and std are crates that can be used from other crates with extern crate core; or extern crate std;, just like ring is a crate that can be used with extern crate ring. The only thing that makes them special is that, by default, the compiler implicitly injects extern crate std; at the top of every crate and use std::prelude::v1::*; at the top of every module and every crate. #![no_std] inhibits this injection. The plan is that extern crate core; and use core::prelude::v1::*; are injected instead, but I don’t know if that part is implemented yet.

So #![no_std] only influences which crates are used. It can not influence what’s in these crates, that would require re-compiling these crates. We could have two different crates that are both called std, one of which only includes the core functionality, and use rustc --extern std=some/thing.rlib to disambiguate… I don’t have a opinion on whether this would be a good idea.

There is no such thing as a #![no_std] executable, only #![no_std] crates. Each crate has a number of dependencies, which may or may not (implicitly) include std. An executable links the union of all recursive dependencies. It can mix crates that use #![no_std] and crates that don’t. In that case std will be linked.

std has a number of pub use statements that “reexport” items defined in its own dependencies. These items are not duplicated, just given an (additional) other name. An executable that links std also links std’s own dependencies.

SimonSapin commented 8 years ago

It’s true that our story is not great for crates that want to optionally use #![no_std], but they can still use extern crate core; even if they also depend on std.

aturon commented 8 years ago

@briansmith

These are all good questions.

I think a lot of the friction here is due to the std facade, which proposed making the standard library largely a shim over a bunch of smaller crates. (Over time, we've scaled this design back significantly.)

The idea with the facade is that for the common case -- when you're using Rust in user space -- you get a simple, monolithic API surface. In particular, some of the modules that std exports are not straight re-exports of core, but include more functionality.

An alternative would have been to not have std at all, and instead ship with a set of crates (libcore, libcollections, libio, etc) some of which work in kernel space. That would effectively make things "no_std by default". Personally, I regret not pushing harder on that alternative, but doing that now would be a massive breaking change.

But given that std is stable, what can we do today?

We could have two different crates that are both called std, one of which only includes the core functionality, and use rustc --extern std=some/thing.rlib to disambiguate… I don’t have a opinion on whether this would be a good idea.

I actually think we should strongly consider such a direction. Part of the pre-stabilization work was to ensure that the module structure in core matched std. We'd probably need to do a bit more work to make sure they are in proper alignment, but the idea that, with a flip of a switch, you get something just like std but with no features that depend on system services... seems rather appealing to me. Certainly it would make migrating to "no_std" (which we'd want to rename...) trivial, modulo not actually depending on extra std features.

perlun commented 8 years ago

I actually think we should strongly consider such a direction. Part of the pre-stabilization work was to ensure that the module structure in core matched std. We'd probably need to do a bit more work to make sure they are in proper alignment, but the idea that, with a flip of a switch, you get something just like std but with no features that depend on system services... seems rather appealing to me. Certainly it would make migrating to "no_std" (which we'd want to rename...) trivial, modulo not actually depending on extra std features.

Not implying the current state of affairs is optimal in any way - wouldn't there be a great risk that such an approach would be confusing to our beloved users? I mean, if std sometimes is the current std and sometimes core... hmm. :smile: It doesn't sound entirely optimal; it sounds like it could look confusing.

But I do agree that it will probably have massive advantages to people developing crates. The risk that someone makes stuff that only works with std and not with core even though it doesn't depend on any std-only features would be much smaller; making a crate that works in both "standalone" and "regular" modes is merely a matter of "not depending on extra std features" like you put it.

aturon commented 8 years ago

@perlun

Definitely a legitimate concern! I think if we went this direction, we'd want to take steps to guarantee that the "core version" of std is always a subset of the "full version".

That said, this isn't so different from the situation in the std::os::* modules, for example -- where even the existence of a given module in that part of the tree depends on what platform you're on. This is basically saying that std is conditionally compiled in two ways, with one API a subset of the other. This kind of thing also happens in the Cargo universe due to features.

I also think that the majority of users wouldn't need to ever think about this -- that's part of the goal here, after all. And I think the reason is exactly as you spelled out; it's just a simple toggle.

petrochenkov commented 8 years ago

I may be saying nonsense, but can't #![no_std] inject the libcore import with renaming?

extern crate core as std;

If someone needs the choice between core and std based on a feature, he can do the same thing:

#[cfg(my_feature)]
extern crate core as std;
#[cfg(not(my_feature))]
extern crate std;

// The rest of the source code uses name std
// ...
aturon commented 8 years ago

@petrochenkov That seems viable, yes. I think the bigger question here is whether this is the desired behavior :)

alexcrichton commented 8 years ago

@briansmith

I am surprised that it is so difficult to convey how confusing this is.

I'm sorry you feel this way, but the current design of libcore/libstd has been around for quite some time now and I'm trying to tease apart concerns that exist from a lack of documentation (which there is a sore need for in this area!) and those which are architectural.

@SimonSapin has an excellent explanation as to the current state of affairs in terms of technical details, and the plan he mentioned is indeed implemented. The key aspect of this is that there is zero duplication among libcore and libstd. Large parts of libstd are simply reexports of libcore, so libraries using one can easily interoperate with another.


@aturon, @SimonSapin, @perlun

It's certainly been considered in the past that std just has a different meaning in a #![no_std] context, but I am personally not a fan of this behavior because I fear that it ends up being confusing when you're looking at a module to know whether it's a "std module" or a "no_std module". By having two separate names for this concept, std and core, it's clear what's what and where any snippet of code is expected to work.

There are also many technical details which simply naturally entail us having two libraries. For example the compiler doesn't auto-inject any --extern flags, it'd be difficult having two binaries on the filesystem which are both "libstd-like", and it'd be interesting to see how the standard library itself would be interested in terms of linking to a "lower libstd". These sorts of things would want to be ironed out, but it seems better to me to stay within the same world we have today of linking, building, and naming libstd.


@briansmith

I think your example may not quite be as dire of a situation as it may seem, for example @petrochenkov has what I think is the right idea to have features which bring in std + more functionality. For example:

// src/lib.rs
#![no_std]
#[cfg(feature = "foo")]
extern crate std;

use core::{ ... };

#[cfg(feature = "foo")]
mod foo;

// src/foo.rs

use std::prelude::v1::*;

// .. code using libstd

In a world like this, once a library is #![no_std] it's always no_std (e.g. nothing conditional). When features are added which require the standard library they they can selectively link to and import the standard library in the modules that are implementing that functionality. There's certainly still a problem with macros, but that's a known deficiency of macros which I don't personally believe justifies a redesign of libcore/libstd to have the two named the same.

Note that this is of course only one method of encoding this sort of pattern, I expect more to arise! I think, however, that having a no_std-but-still-with-some-std-features library may not incur quite the annotation burden overhead you may be thinking.

aturon commented 8 years ago

@alexcrichton

It's certainly been considered in the past that std just has a different meaning in a #![no_std] context, but I am personally not a fan of this behavior because I fear that it ends up being confusing when you're looking at a module to know whether it's a "std module" or a "no_std module". By having two separate names for this concept, std and core, it's clear what's what and where any snippet of code is expected to work.

I'd like to understand a bit better the precise confusion you're worried about here.

If we take this approach, I imagine we'd ensure that core is a perfect subset of std, so that everything you can access and use in core is present and has the same meaning as with std. The only difference is that std contains a larger API surface.

Can you spell out in more detail what precise confusion you're worried about, given the above?

Put differently, I think the whole point that @briansmith is trying to get at is that it seems desirable to be able to write a program that begins by targeting std and, if it doesn't use non-core features, can be trivially made to run in a no_std setting.

I think that's, perhaps, the core disconnect here: how important do you consider it that it be easy -- even trivial -- to "port" libraries to no_std when they don't use non-core features?

There are also many technical details which simply naturally entail us having two libraries. For example the compiler doesn't auto-inject any --extern flags, it'd be difficult having two binaries on the filesystem which are both "libstd-like", and it'd be interesting to see how the standard library itself would be interested in terms of linking to a "lower libstd". These sorts of things would want to be ironed out, but it seems better to me to stay within the same world we have today of linking, building, and naming libstd.

That indeed sounds complicated, but is there some deep reason we can't take @petrochenkov's simple suggestion?

SimonSapin commented 8 years ago

The only difference is that std contains a larger API surface.

I’m worried about things that exist in both with the same name but with different behavior. The only example I can think of right now is the try! macro from std that uses std::convert::From in the error case, while try! from core does not. I don’t know if there are other cases (not necessarily macros).

briansmith commented 8 years ago

It's certainly been considered in the past that std just has a different meaning in a #![no_std] context, but I am personally not a fan of this behavior because I fear that it ends up being confusing when you're looking at a module to know whether it's a "std module" or a "no_std module". By having two separate names for this concept, std and core, it's clear what's what and where any snippet of code is expected to work.

As somebody building stuff for no_std, in particular building libraries and services in Rust for microcontrollers with and without operating systems, I would say that the thing you are optimizing for isn't very useful. I can easily just build the code for a target that is missing some features and find out what features that code really needs, just like I do in C and C++. And, the cost of making no_std mode so different from std mode is very painful. Note that C and C++ don't force the programmer to learn two same-but-different APIs when targeting subsets. In general, the POSIX model of subsetting works well and IMO that is what programmers are expecting.

Anyway, while working on some stuff this weekend, I came across another problem that the current RFC does not address, AFAICT: Although I want my library to be #![no_std], I don't want the tests sub-modules of my library to be #![no_std]. In particular, my tests require Vec and HashMap. So, there needs to be some way of doing this:

#![no_std]
....
#[cfg(test)]
mod tests {
    #![no_std + liballoc]
    ...
}

I would also like to point out that many applications and libraries for embedded will require #![no_std] + liballoc, some will require #![no_std] + liballoc + threads, some will require #![no_std] + a clock that can return the current time, etc.

Also, AFAICT, there needs to be some way to tell the compiler to avoid generating the unwinding code to reduce code size when panic! is implemented to just reset the device instead of unwind.

Regarding injecting use core as std, that seems like a big improvement to the proposal. However, either way, the documentation at https://doc.rust-lang.org/std/ should be updated to indicate which features are in core and which aren't. In particular, there shouldn't be separate documentation at https://doc.rust-lang.org/core. In general, anything that exposes libcore as a separate entity to developers, except the developers of the standard library, seems like an indication of something going wrong.

briansmith commented 8 years ago

The only difference is that std contains a larger API surface. I’m worried about things that exist in both with the same name but with different behavior. The only example I can think of right now is the try! macro from std that uses std::convert::From in the error case, while try! from core does not. I don’t know if there are other cases (not necessarily macros).

Isn't that a bug in libcore? Regardless of whether libcore is exposed with a separate name or not, developers are going to expect same-named items of core to work like same-named items of std.

alexcrichton commented 8 years ago

@aturon

Can you spell out in more detail what precise confusion you're worried about, given the above?

Certainly! To be clear, I'm also always under the assumption that libcore is simply a subset of libstd, I don't think there's any core-specific functionality we'll want to have and certainly no core-specific structure.

Here's some failure scenarios of "no_std == just a subset of libstd" that I'm worried about:

Does that help clarify what I'm thinking?

To expand a little more perhaps without examples, the code which I envision targeting libcore feels different enough than code targeting libstd that I don't think it makes sense to unify the two. I certainly want to make porting code to libcore easy, but I don't think std vs core in terms of naming is ever going to be the most painful part of doing this. I think that effort will always be required to make existing code core-compatible (e.g. structurally or API-wise) and that will always dwarf any amount of sed required to rename imports.

I certainly sympathize with trying to make everything "Just Work" as much as possible all the time, but this to me feels like a situation that the libcore/libstd distinction is worth it. If you want to port and truly are using "just the right subset" of libstd, then it's trivial to port with some renamings. The trick @petrochenkov could indeed be used to have a "minimal patch", but for the reasons above I wouldn't want to do it by default.


I came across another problem that the current RFC does not address, AFAICT: Although I want my library to be #![no_std], I don't want the tests sub-modules of my library to be #![no_std].

You may want to explore the Rust source tree for examples of how to manage crates with #![no_std] (as all the facade crates have to deal with this). I'd also encourage taking a look at rustc --pretty expanded even for the standard library, because it turns out that "using std" is actually quite low overhead and very simple to enable. These may help show how to compose examples together and existing tools to work with situations like this. And of course, these are definitely the sorts of areas that we likely need better documentation on!

For example your library could be structured like:

#![no_std]
#[cfg(test)]
extern crate std;

#[cfg(test)]
mod tests {
    use std::prelude::v1::*;

    // all tests have access to libstd via the prelude + specific imports
}

When using libtest you're already required to have the standard library anyway (as libtest links to it) so it should be fine to just go ahead and use all the features of libstd.

Also, AFAICT, there needs to be some way to tell the compiler to avoid generating the unwinding code to reduce code size when panic! is implemented to just reset the device instead of unwind.

I agree! If possible, though, I'd prefer to keep this issue focused on just libcore. For now, however, you can get away with -Z no-landing-pads which should disable generation of unwinding code.

aturon commented 8 years ago

@alexcrichton

Thanks, that was a helpful clarification! TLDR: I think you raise legitimate points, but think they are outweighed by the benefit of simple porting and familiarity.

Let me briefly go point-by-point.

When reading code, I no longer know whether it's intended to be no_std or not.

Very true, and in some sense, that's part of the point here. I think it's fair to say that we're trading off local clarity (as you're talking about here) with ease of porting code that fits in core's footprint.

Also, note that unless you've memorized what libcore provides, you're going to be leaning on the compiler to tell you what's available anyway. And I suspect that once you realize a crate is trying to be no_std compatible, you'll stop reaching for anything that allocates quickly.

It'd be easy to signal no_std more loudly as well, e.g. via a crates.io badge.

Given an arbitrary snippet of code, the likelihood that it is copy/pastable into my no_std crate (even if core is aliased as std) I believe to be very small.

This is absolutely true! However, that's even more true with the status quo, as most code will be using std directly. Today, to figure out if I can copy the code in, I have to go through the effort of renaming to core etc before I can try it.

I think what you might be getting at here, though, is a different version of your first point: if the code is using core then I immediately know I can copy it in. But by the same token, if I'm bringing in code from a no_std crate I know I'm good to go.

If we name libcore/libstd the same, then I share @perlun's worry that this is the same name for two different concepts.

As I wrote up-thread, this is a legitimate worry, but I think it's largely ameliorated by ensuring "core std" is a true subset of "std". I think it's a pretty easy concept to get your head around, and I also think a large part of the ecosystem won't even have to (allocation's pretty common, after all...)

To be honest, I think it's easier to understand this design than to grasp that std is actually a facade.

If both core and std are named the same, it's hard to contain the "std features" of a crate. ... Without the ability to name both in source code it may be difficult to have optional features in crates.

Presumably in the case you're imagining, the optional feature also selects which version of std you're linking to? ISTM that the compiler would then help you ensure you're staying in the right footprint? But I could be missing something.

I'm not personally convinced that there's much of an advantage to be had of no_std just giving you a subset of std. ... I would much rather you explicitly add support than just stumble into it and then receive bug reports after the fact when you accidentally use a real-std-only feature.

I was imagining the workflow here would be that you would opt in to no_std if/when a downstream customer wanted to rely on it, so there's no chance of bug reports as long as you successfully compile.

To expand a little more perhaps without examples, the code which I envision targeting libcore feels different enough than code targeting libstd that I don't think it makes sense to unify the two. I think that effort will always be required to make existing code core-compatible (e.g. structurally or API-wise) and that will always dwarf any amount of sed required to rename imports.

I agree that no_std is a different world, but (your points above aside) see no reason to add friction to living in that world. Ideally, we would foster a thriving no_std ecosystem, and it's not unthinkable for such crates to be used in crates that otherwise use std. Being able to dive into that world and see familiar std imports seems to reduce, even if in a tiny way, the friction involved.

nastevens commented 8 years ago

I feel that I need to comment here, not to downplay any of @briansmith's concerns, but to provide an opposing viewpoint with my own experience. I have been using no_std extensively, including one library that has been made available on crates.io: https://crates.io/crates/fixedvec. I have found the core vs std split to be ideal, as it clearly lays out what functionality I have available. I search the rustdoc for 'core::foo' and can get the libcore API for foo. I have never had a problem running code I've written to use no_std in a "full std" environment.

Testing has not been a problem either. I simply add:

#[cfg(test)]
#[macro_use]
extern crate std

at the top of the file and then

use std::prelude::v1::*;

to my test module.

My biggest difficulties have been around the Cargo support for no_std, not libcore itself. Right now there is not a way to search/filter libraries that are allocation-free. I've added a variety of keywords to my own crate to try and indicate that it uses only libcore, but an official marker of some sort would be fantastic.

SimonSapin commented 8 years ago

Isn't that a bug in libcore?

I had assumed this was because std::convert did not exist in libcore, but it does. Let’s fix this: https://github.com/rust-lang/rust/pull/28722.

posborne commented 8 years ago

I agree with @nastevens and @alexcrichton that depending on core explicitly is probably the better option in this case. IMO, the minor loss in convenience when porting a crate to no_std without any kind of std facade is worth it for the clarity of explicit imports from core.

I'm personally looking at restructuring some of my existing crates providing device access (sysfs_gpio, i2cdev, spidev) under Linux to provide no_std traits so that device drivers can be written to be portable to projects where std will never be available like zinc or other embedded platforms.

nikomatsakis commented 8 years ago

TL;DR: Wherein Niko begins as unsure, but gradually convinces himself that subsets of libstd is a promising model, and may perhaps solve some other concerns as well.

So I've just been catching up on this thread. With respect to what is more confusing, it is always hard to separate out the effects of lack of documentation, of course, but I also think it's interesting that both of these designs wind up (in a way) with two names for the same thing:

Writing it out that way, I have to admit that I agree that the "subset std" approach feels overall less confusing. After all, I must always check whether the suggested functions belong to my subset, but at least this way I can do that by just copy-and-pasting the code, and without the need to switch from core to std. Moreover, while it is true that one will colloquially have to clarify which subset of std you are using, I feel like that is already the case. That is, you won't say "I'm using 'core std'" or "'real std'", you will say "I'm using the core subset" or "I'm using the alloc subset" (presumably we'll have named subsets roughly corresponding to the existing facade).

Some things get a lot nicer in this model (imo). For example, the inherent methods on primitive types can now be cleanly shared. Some other things seem to get more complicated. For example, I do not know how this interacts with @alexcrichton's "swappable allocator crate" design -- I guess that liballoc wants (at least) to remain a distinct crate. This is probably OK because I don't think it shares much surface area with libstd (if any).

However, what I find interesting here is that this really sounds to me like a kind of lint, or perhaps an additional feature of some kind. That is, you want a way to declare what subset of libstd you are intending to use, and you want to get warnings (or errors) when you stray from that subset even if your current target happens to support something larger. Having end-users employ #[cfg] switches doesn't feel like quite the right solution to me, because it is likely to lead to pull requests that accidentally add inappropriate dependencies. These errors would not be detected until the tests are run on the full suite of configurations.

Interestingly, this starts to sound a lot like the desire which many have to straddle Rust versions. That is, I might like to restrict myself to the subset of std that was available in Rust 1.5, even while maintaining forwards compatibility with Rust 1.6, 1.7, and so forth. Currently we don't really have a good way to do this, except for building with all three versions of Rust (and of course there's a question about how to manage deprecation, but that's an orthogonal concern, I think).

Finally, I think that this idea of wanting to limit oneself to subsets is not limited to libstd, of course. It applies to every library. After all, if I'm building on @briansmith's library, which offers limited functionality for more limited targets, I may also want to be able to ensure that I don't lean on functionality outside of that subset.

So I feel torn. On the one hand, I think introducing some notion of named subsets of std is a promising way of thinking about things. On the other hand, it sounds like something that will take some design work to get right, so it would clearly be some time before we could stabilize no_std (or whatever it's equivalent becomes).

nikomatsakis commented 8 years ago

@alexcrichton

There are also many technical details which simply naturally entail us having two libraries. For example the compiler doesn't auto-inject any --extern flags, it'd be difficult having two binaries on the filesystem which are both "libstd-like", and it'd be interesting to see how the standard library itself would be interested in terms of linking to a "lower libstd". These sorts of things would want to be ironed out, but it seems better to me to stay within the same world we have today of linking, building, and naming libstd.

I don't know if things have to be this complicated. The way I see it, any given system would only have one libstd, but libraries would use lints to restrict themselves from using features that may not be available on all platforms of interest. So there would not be a need for two binaries that are "libstd-like", just one.

steveklabnik commented 8 years ago

So when you see a stackoverflow question talking about core::foo (posted, perhaps, by some new user that works in a shop where they write no_std crates)

I mean, even our error messages today show core::, since they're re-exports, and this has caused confusion in the past.

nikomatsakis commented 8 years ago

Let me clarify one thing I wrote:

Having end-users employ #[cfg] switches doesn't feel like quite the right solution to me, because it is likely to lead to pull requests that accidentally add inappropriate dependencies. These errors would not be detected until the tests are run on the full suite of configurations.

Re-reading this, I think it is unclearly phrased. Let me spell out my thinking a bit more. I am basically proposing that nobody ever "selects" a subset to use in terms of setting cfg features. Rather, the target that they choose implies a particular subset. If you try to use something that is not in that subset, you will get name resolution errors. In principle, this is sufficient for our purposes, but it doesn't give a good user experience -- you'd like to be able to build for one target and have some reasonable confidence that this code will work on all targets (obviously this cannot be guaranteed in the limit, but it's nice for it to be "as true as possible"). This is why having some sort of lint that lets you say "in this module, I intend to stick to the core subset" would be helpful. Does that make sense?

alexcrichton commented 8 years ago

Thanks for the comments everyone! I had a long discussion with @aturon last night as well, and I wanted to explain one of the key things I remembered as part of that.

One of the hard design constraints to me here is that crates build for no_std must be able to interoperate with crates that use "regular std". This prevents splits in the ecosystem and enables a great deal of reuse which I think is quite important. This also has the nice property of enabling things like custom allocators written in Rust where the allocator is no_std (because std needs an allocator and circular deps are bad), but then another crate can explicitly depend on that allocator and std and everything will "just work" because all the types match up.

So, given that, I think it may be useful to explore some technical aspects of the implementation to see what a "std subset" design might look like. I think this is useful to both see what some limitations might be as well as having a good handle on what it would look like to perhaps find some possible drawbacks.

First, I think that simply compiling std twice with different profiles won't work unfortunately. If the "real std" doesn't link to the "core std" then the types won't be shared and they won't interoperate. Even in terms of ABI the symbols among the two libraries will be quite different (a technical hurdle which may in theory be overcomable, though). Given that, at least in today's world, the way that this would work is that the "real std" and the "core std" would be two separate crates (e.g. like libstd and libcore today).

Now that we've got two crates, it's possible to do what @petrochenkov mentioned which is to have the compiler simply inject extern crate foo as std and then you have some attribute to decide what foo is (and it's "std" by default). We would do our due diligence and ensure that core is a subset of std at all times, and then we'd presumably have two (disconnected?) forms of documentation generated via rustdoc hosted to explore the two libraries.

With this setup, let's explore how we might evolve the no_std story over time. Let's say in the future that we want to stabilize more subsets of std, such as "std with only an allocator". The way we'd implement this (given what I mentioned above), is to have a crate in our distribution, say alloc, and structure it in the same way as the rest of std except add on modules and functionality that require allocation (e.g. collections, smart pointers, etc). This seems relatively straightforward to me! Now if we keep going and we want to stabilize a whole lot of subsets then in theory each subset may not be a strict superset of what's beneath it. For example maybe it makes sense to have "core + io" and "core + allocator" be two different subsets of std. If we use crate attributes to control which subset of std you receive, then we'll have to ship a permutation of crates to match each subset you may select. This seems less straightforward than before, but this may also not come up in practice.

The evolution story to me seems a little cleaner where we have explicit crates that represent what's under std (e.g. A + B == "extern crate A; extern crate B;") and may provide us more flexibility with adding "std subsets" in the future.

I think if we stop at "core std" and "real std" and don't go much farther then the concern about evolution isn't that pressing. If we still commit to multiple crates though it may not "magically fix" (as I wish it would) problems like inherent impls or duplicated documentation, we'd still have work to do on those fronts.

So given all that, what do others think? Do you agree that a "subset std" approach requires multiple crates? Do you think the evolution story is a non-concern?


@aturon

I think I agree with you that if the only reason for core to exist was to give an explicit name to a subset of std then it wouldn't be worth sacrificing the ease of porting and familiarity. Some of the technical concerns I've had, however, I think may still tip my opinion in favor of sticking with core + std. I'm sorry I'm a bit scatterbrained though! I'm curious to hear your thoughts on what's above, though. For some of the other points you made:

To be honest, I think it's easier to understand this design than to grasp that std is actually a facade.

I think I agree with this, yeah. I unfortunately don't think I've seen anyone who instantly understood that core and std interoperate seamlessly and there always seems to be a fear that one won't work with the other. If everything was named std, however, it may alleviate this concern by only having one library instead of an "apparent two".

ISTM that the compiler would then help you ensure you're staying in the right footprint? But I could be missing something.

Ah I was thinking along the lines of mod foo; has access to core-only features and mod bar; has std features as well (e.g. enabled via a cargo feature flag). The "compiler helping" here would be the foo module not importing anything from std but bar would import std's prelude.

I was imagining the workflow here would be that you would opt in to no_std if/when a downstream customer wanted to rely on it, so there's no chance of bug reports as long as you successfully compile.

Ah yeah I'd be more worried in the case where the author didn't actually intend to have no_std support (it just accidentally happened), and then downstream authors started relying on that. If the upstream crate then started using String it would break everyone downstream without the upstream author being aware (in theory). This isn't a super strong concern, of course, just a failure mode that I'd like to avoid.

Ideally, we would foster a thriving no_std ecosystem

To be clear, I totally agree! I think we may just differ on how much we consider "use core" vs "use std" as friction for entering this ecosystem. I expect this ecosystem to naturally have a high amount of friction simply because it's not "the default" (e.g. "real std" is the default). Maintaining crates in this ecosystem will also have some degree of tension to provide both features that work everywhere as well as perhaps-more-ergonomic std-only features.

Put another way, in my mind there's N units of inherent friction to be a no_std crate (which of course has quite the payoff to make it worth it!), and the amount of fiction from core vs std I don't see as a large enough portion of N to be too worried about. I could be wrong, however! Maybe there's a scenario where there's 0 friction to be a no_std crate? Perhaps I'm playing "doomsayer" too much?


@nikomatsakis

I don't know if things have to be this complicated. The way I see it, any given system would only have one libstd, but libraries would use lints to restrict themselves from using features that may not be available on all platforms of interest. So there would not be a need for two binaries that are "libstd-like", just one.

I agree that I may have been too extreme in painting the picture as complicated. I'm curious to explore this idea of a lint, though. I think it would answer "no" to "do we require multiple crates" above, so it seems like it may assuage some of my concerns!

So exploring this for a bit, one very nice property that libcore has (which I'd personally like to maintain) is that it can be compiled for basically all targets/platforms/etc (so long as LLVM targets it). If we took the route of "there is only one true std" and then a lint basically constraints you to a particular subset, it may make the compilation and/or linkage story for the standard library difficult for some platforms.

For example, I believe some Rust kernels compile with the x86_64-unknown-linux-gnu target, meaning they'd definitely want a lint to constrain std but they'd have to rely on the linker to strip out everything in std that's "not part of the profile". While I think this is technically feasible, I'd be a little wary of relying on it happening automatically.

On the other hand, if I'm writing a new kernel (say, redox perhaps) then when I compile the "one true std" I in theory don't even want to compile things like I/O or allocation because they don't exist. We can handle this with #[cfg] in the standard library, but that may not be available for a brand-new platform and may be painful to add.

Overall, having a libcore/libstd split does have the benefit of making compilation quite easy. You're basically guaranteed to be able to compile libcore at all if your platform can run any Rust code whatsoever, and if you want to get fancy you can work your way towards std.

Does that make sense? Perhaps there's a method where the a lint-based and/or "one true std" approach wouldn't have this downside?

nikomatsakis commented 8 years ago

So reading @alexcrichton's comment (which I haven't completed yet), I realized that one catch with the way I was thinking about things is that, lacking core, we will not be able to have a distinct allocator crate, because it would (presumably) want to have a dependency on std, but that is a circular condition.

nikomatsakis commented 8 years ago

@alexcrichton

So I feel like I don't quite understand your questions. Perhaps I did not explain what I have in mind very well. Let me take one more shot at explaining it. Let me also point out that I am partially playing devil's advocate here, the current setup seems quite workable. But there are some places where the facade has made things harder for us: error messages are hard because pub use; rustdoc is harder and more complex than it has to be; coherence violations are more problematic than you might hope; inherent impls on primitives are trickier; etc. All of these things to some extent are problems that we might want to solve anyway, but they are exacerbated by the facade. So I'm interested to know if we can find a nice alternative.

Anyway, this is what I was thinking:

  1. We define various subsets of std and use #[cfg] to select out what belongs to each subset.
    • So, for example, there is a "core" subset that includes only those things that are presently in libcore.
    • And there is another subset "collections" that includes "core" but also the collection types.
  2. When you choose a target from our list of supported targets, that also implies a subset of std. For most "full-featured" targets, it would be the full subset, but we might have targets for more restricted modes of operation. For example, a kernel target of some kind that uses only the "core" subset of std, or something for embedded platforms. Or maybe this is something you can select orthogonally from the target, doesn't matter. Main point is: it's something that you choose at the time you invoke "cargo build" for a specific platform/configuration.
  3. Meanwhile, libraries can declare using an attribute what subset they intend to use. They will receive lint errors if they stray outside this subset. Because this is just a lint, it can be overridden for submodules. For example, I might have submodules that I know are only available on "normal" platforms, and those might opt into the full std, but other submodules I want to be portable more widely, which would restrict themselves to the "core" subset of std.

Now what I was concerned about was the allocator crate situation. But I was thinking that perhaps this could be resolved via "forward reference" to an allocator crate, much as we do today from core to handle unwinding. This seems to fit with our overall model, where we want libraries to be compiled without reference to any particular allocator, and we only pick an allocator at the last second. But we should discuss.

OK, in light of that explanation, let me try to make sure I follow some of your specific comments:

So exploring this for a bit, one very nice property that libcore has (which I'd personally like to maintain) is that it can be compiled for basically all targets/platforms/etc (so long as LLVM targets it). If we took the route of "there is only one true std" and then a lint basically constraints you to a particular subset, it may make the compilation and/or linkage story for the standard library difficult for some platforms.

I don't understand how anything changes here. libstd would have defined subsets, one of which would be equivalent to today's core. So when bringing up a new target, you can choose what subset your target supports, and core would always be supported.

On the other hand, if I'm writing a new kernel (say, redox perhaps) then when I compile the "one true std" I in theory don't even want to compile things like I/O or allocation because they don't exist. We can handle this with #[cfg] in the standard library, but that may not be available for a brand-new platform and may be painful to add.

I think what you are saying is: "if I am defining a new target, I may have a different subset that I wish to use, and none of the pre-defined subsets may apply". This is no doubt true, but then this is true of crates too -- I may find that I can use libcore and some portion of libcollections, but not all of it (or whatever). Then I just have to petition to split up libcollections and, in the meantime, reproduce what I need. I would think it'd be the same with named subsets: you would say that "target XYZ only supports the core subset of libstd" (the minimal one) and you'd offer additional XYZ libraries to replace what is needed, perhaps eventually tweaking what subset of std is available for XYZ.

alexcrichton commented 8 years ago

Thanks for the elaboration @nikomatsakis! I think what you're thinking definitely makes more sense to me now. We discussed this yesterday as well, but I think it'd be good to write down some of the points which may be pros/cons as a summary as well:

I think that's all I can remember right now at least! There's definitely quite a few tradeoffs here, but overall I feel that a shift in strategy would simply be moving the needle on a lot of these tradeoffs rather than "solving a bunch and making few problems".

Thoughts though?

nikomatsakis commented 8 years ago

@alexcrichton Thanks for writing that up. I agree it's a pretty accurate summary. I also agree that what I'm proposing is pretty different from what's there and so while it may solve or ameliorate some problems, it may well bring a few of its own that aren't obvious.

(Still, I do think the existence of "cargo features" suggests that people in the ecosystem at large don't want to just break things up into lots of microcrates (or can't do so). After all, why aren't they using facades instead of cargo features? Similarly, platform-specific functionality seems to fit into more cleanly into the model I'm talking about.)

On Thu, Oct 1, 2015 at 12:44 PM, Alex Crichton notifications@github.com wrote:

Thanks for the elaboration @nikomatsakis https://github.com/nikomatsakis! I think what you're thinking definitely makes more sense to me now. We discussed this yesterday as well, but I think it'd be good to write down some of the points which may be pros/cons as a summary as well:

  • This allows us to truly have "one std" as there's literally only one binary with one object file in it. The object file will contain all of std for the target in question. We'd be heavily relying on the linker to GC away unused functions to "appear like" the object file contains only what you used.
  • A new scheme will need to be invented to say "I depend on library X, but I only want features A, B, and C". This will probably be something connected to the extern crate declaration, but the idea is that you need a way of instructing the compiler what subset you want. The compiler would then use the annotated source of X (std in our case) to only expose certain APIs and such as part of the public interface.
  • The "bad error message" problem will be solved with this as everything truly lives in one location. Note though that this error message is not unique to the standard library, reexports are used widely throughout the ecosystem so it'd just decrease the severity of the problem, not make it go away.
  • The "duplicate documentation" problem will also be solved for the standard library, but it doesn't solve the problem of inlining documentation from other crates and reexporting documentation as this is also a common strategy throughout the ecosystem.
  • This would solve the ecosystem problem of "I depended on crate Foo with feature A but I can use feature B accidentally". This is Cargo-specific and I can elaborate more on this if needed, but just pointing out it solves a problem in the ecosystem today!
  • This partly solves the SIMD problem of "compile all SIMD into std but only expose a little when you link against std". There'd still be some details to work out here
  • This would introduce a problem of the "core std" subset needs to statically be known to not access the "real std" subset (e.g. how libcore can't access libstd today).

I think that's all I can remember right now at least! There's definitely quite a few tradeoffs here, but overall I feel that a shift in strategy would simply be moving the needle on a lot of these tradeoffs rather than "solving a bunch and making few problems".

Thoughts though?

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/27701#issuecomment-144783045.

ahmedcharles commented 8 years ago

Just curious, what was decided here?

SimonSapin commented 8 years ago

@ahmedcharles, the "final comment period" tag means that a decision is gonna be made soon, so now is a good time if you have comment that you haven’t told yet. This issue is in the list that https://internals.rust-lang.org/t/library-fcp-issues-for-1-5-closing-soon/2808 says the libs team is gonna decide on this week.

briansmith commented 8 years ago

From what I can tell, there seemed to be a lot of agreement on redesigning this feature, but the RFC wasn't updated with a new design. So I don't see how the libs team could make a decision this week.

SimonSapin commented 8 years ago

My understanding is that this week’s decision is "is this stable in 1.5". Answering "no" doesn’t necessarily mean rejecting the proposal, it might just defer until the design is more solid, possibly in 1.6 six weeks later.

briansmith commented 8 years ago

OK, that sounds very reasonable. I am happy to help with the experiments for a new design where everything can be referenced as std::. It's a little unclear how to make progress here, so some guidance would be appreciated.

ahmedcharles commented 8 years ago

I personally think that core should be an implementation detail rather than something users of rust should type when writing source code outside the primary rust repo. Granted, there are many ways of accomplishing this and I don't know enough at the moment to vote either way.