Create some compile-time notion of "runtime mode"

sam0x17 commented 1 year ago

This came out of a conversation with @kianenigma, @ggwpez and other members of the FRAME team so I'll do my best to reproduce the highlights of that conversation here. The short version is that right now in a substrate-based project, we don't really have a well-defined, standardized notion of what the build "intent" or "mode" is at runtime, nor do we have this information in an easily-consumable form at compile-time, and this is something we could improve upon. I'll explain what I mean:

Some background:

Before I worked in Rust I was a long-time rubyist. Ruby on Rails (web framework used at a lot of startups) is known for having a very clear notion of what in the Rails world is called an "environment". When you run a Rails app, there are three default "environments" that come built-in: "test", "development" and "production". An app's environment is determined when it launches and cannot change while it is running, and this information can be accessed via globally accessible methods such as Rails.env.test?, Rails.env.production? etc., and is set by the environment variable RAILS_ENV.

When you run unit and integration tests, the app will automatically be in test mode (RAILS_ENV=test), and all sorts of configuration and things can then load and happen on an environment-specific basis (just like cfg(test) in Rust). It is quite common, for example, to have blocks of code that will do a switch statement that does different things for each of the defined environments.

The "development" environment is the default and is typically used for local development/tinkering. The "production" environment is used on deployed servers, and people also often add an additional "staging" environment that closely matches "production" but with subtle differences like different db credentials, etc. Secret management is also typically set up at the environment level.

Running development mode uses a lazy asset pipeline suitable for local development, while running the app in production mode will result in performance-optimized compiled assets, tighter security settings (CORS, etc), production-specific API keys and secrets, etc..

By standardizing these "environments" at the framework level, all tooling and plugins that integrate with Rails can make certain assumptions based on what environment we are in, and this is healthy for the ecosystem. Crucially, this allows gem (crate) authors to specify that certain features can or cannot be used in certain environments, and since all Rails apps are structured roughly the same way, this will just work out of the box for arbitrary Rails apps.

It would be great if we could rely on something conceptually similar to this when building with substrate!

My Suggestion

So given my background you can understand why when I started working on substrate, I was surprised we don't have some notion similar to this. For example, at build time the only thing we really know is whether we are in test mode or not, which doesn't tell us much. We don't (at least not in a way that is standardized and easily accessible at compile-time anywhere and everywhere in substrate, for example in individual pallets) know if we are polkadot itself, or a parachain, and we don't know whether we are in production or if we are just some random dev playing around with things locally.

As far as I can tell there are at least two notions of "environment" that would be useful for us in substrate:

What this build is (i.e. polkadot, a parachain, etc.)... I would tentatively call this something like RuntimeType. This already exists in a limited fashion in some of the structs and traits generated by construct_runtime!, but right now this information is not easily consumed at compile-time in other contexts like in a pallet.
Why this build is running (i.e. we are running tests, running benchmarks, doing local development/tinkering, some sort of staging server, production, etc)... I would tentatively call this something like BuildIntent or BuildMode or BuildEnvironment

It would make a lot of sense to formulate these as mutually exclusive cfg flags, though cfg flags have their own pitfalls and we could just as easaily implement something like this entirely without cfg flags.

If we did go the cfg flag route, it would be quite easy to gate things for example something like:

#[cfg(not(production))]
// unsecure stuff we don't ever want used in secure contexts here

For our purposes, it also might make sense to replace the word "production" with "secure", so we would make the "test", "development", and "secure" features mutually exclusive and require exactly one of them to be set at all times.

We technically already have the test environment because rust has a built-in test mode and we use cfg(test) in many places in the code.

It doesn't have to be cfg flags, again, but regardless, if we were to formalize automatically generated enums at compile-time that provide information like this, or produce cfg flags that accomplish the same thing, it would be much, much easier to do things like gate certain features from being used in "production" or in certain contexts, for example:

Our "pallet dev mode" (#12536) should not be used other than for local tinkering and possibly in some exotic test situations but right now this is unenforceable
Recently (#13301) we had to rename the randomness-collective-flip pallet to insecure-randomness-collective-flip to discourage its usage in production contexts because it is inherently insecure
There are many other scenarios similar to this where we wish we could raise a compiler error if someone tries to use X in Y scenario, but we have no way of systematically detecting that scenario because we simply don't have an abstraction for it.

While renaming knowingly insecure pallets to be prefixed with "insecure" is an effecitve counter-measure to some of these scenarios, situations like this could be made even safer if we had the granularity at compile-time to actually know whether this is a context where security is required and issue a compile error accordingly if the programmer is trying to use something inherently insecure in a secure context.

Another advantage of the cfg route, by the way, is Rust already has a strong notion of the test cfg, so we'd just be adding more and making them mutually exclusive (which is something I've done before).

Thoughts?

bkchr commented 1 year ago

Another advantage of the cfg route, by the way, is Rust already has a strong notion of the test cfg, so we'd just be adding more and making them mutually exclusive (which is something I've done before).

Features in rust are not mutually exclusive and this leads to problems. Yeah, I know that you can do (all(feature_x, feature_y), compile_error) but I'm not convinced. We already had nondeterminism because of feature flags, you can ask @kianenigma (I know that this was between native vs wasm, which will not be there anymore in the future, but it already shows what could happen).

And how would the env work for when you are having some mock runtime? How do you tell the pallets that you just compile the mock runtime in some random crate?

We already have some production feature flag for Polkadot/Kusama (it disables logging currently). I think this is in general a good idea to have this. This is probably something we should teach people about and how to do it.

What this build is (i.e. polkadot, a parachain, etc.)... I would tentatively call this something like RuntimeType. This already exists in a limited fashion in some of the structs and traits generated by construct_runtime!, but right now this information is not easily consumed at compile-time in other contexts like in a pallet.

I don't get this, what should the pallet do with this information?

sam0x17 commented 1 year ago

An additional thing that might be useful at compile-time would be the "role" like validator, etc if that isn't already available.

Also those are all fair points @bkchr. Off the top of my head I don't know if there would be a use case for the "What" one but I think @kianenigma had something in mind with that one

sam0x17 commented 1 year ago

And yeah, I actually implemented rails-like environments in a personal rust-based web app like this:

use strum::IntoStaticStr;

#[derive(Clone, Debug, IntoStaticStr)]
pub enum Environment {
    Development,
    Test,
    Staging,
    Production,
}

#[cfg(not(any(test, staging, production)))]
pub const CURRENT: Environment = Environment::Development;
#[cfg(test)]
pub const CURRENT: Environment = Environment::Test;
#[cfg(staging)]
pub const CURRENT: Environment = Environment::Staging;
#[cfg(production)]
pub const CURRENT: Environment = Environment::Production;

This coupled with making development a default feature worked for my purposes, but I agree there are a ton of gotchas with how cargo handles feature flags

jasl commented 1 year ago

Rails env is used for control the application behavior, Rust Web framework e.g. Rocket has the same design https://rocket.rs/v0.5-rc/guide/upgrading/#profiles

I think Rust feature gate are not design for the same sceanario, it's just a tool for conditional compiling, adding too much feature gate may lead some compositions not orthogonal then introducing unexpect bug, rencently I already reported feature-gate relates bug that break parachains compiling with runtime-benchmarks feature, so I think feature gate should be use cautiously And, the feature gate are also use for control the compiling process, like enable LTO, a staging testnet runtime should use the same compiling parameters as production.

On the other hands, I like the idea of your example, but I think it should be limited in constants level, in test or staging, we usually hope short governance period, charge some money to testing accounts, etc, use the style of code you showed can organize them better.

One thing is hard to solve is, in non-production env we hope to have Sudo pallet, in this case, I don't find good way for this, so I'm trying separate staging and production to different runtime, and make pallets configurations and other primitives reusable, then the runtime crate only has construct_runtime! {} macro (actully it also has impl_runtime_apis!)

sam0x17 commented 1 year ago

Yeah I agree that we should probably avoid using cfg for this. That said, would lean towards a solution that makes this info available at compile-time, as that is where a good bit of the utility lays

I also haven't gone very deep into construct_runtime! yet so most of my exposure to these issues is purely within pallets. That's why I'd heavily lean on other people's advice here

jasl commented 1 year ago

I haven't try pallet dev mode yet, but I believe it should always generate warning when compiling runtime, maybe we can add a env maybe called ALLOW_IN_DEV_PALLET to control allow dev mode pallet can be complied and by default it will be disallow

for renaming, I don't have idea, for the randomness pallet, the renaming is good, it's breaking change but not hard to adapt.

sam0x17 commented 1 year ago

I haven't try pallet dev mode yet, but I believe it should always generate warning when compiling runtime

And this raises another common issue. I agree with you, but rust still hasn't stabilized the ability to raise arbitrary warnings from macro world. We are limited to the warnings the compiler normally generates. @ggwpez has a slick workaround that involves making an unused item to intentionally have the compiler issue a warning or something like that (I forget), but it isn't perfect.

The env var idea would work well for dev mode, but I guess my point is we're re-creating what could be a global abstraction in numerous places when we should just figure out a global abstraction.

kianenigma commented 1 year ago

I think we need this sooner rather than later, and I would also not use runt features for this.

Option 1: Check in `construct_runtime`

Assuming enum Env { Parachain, Solochain },

Each pallet exposes something like Convert<Env, bool>, and construct_runtime would receive env = Env::Parachain, call into all pallet's convert(env) and make sure they are all true.

Option 2: Check within the pallet.

Each pallet will receive a Get<Env> from the outer world, which will tell it its Env. The pallet itself can decide to prevent this in its integrity_check (which btw we should please please rename to on_construct_runtime()).

linking this to paritytech/polkadot-sdk#264.

xlc commented 1 year ago

Any TLDR on exactly the problem this is trying to solve?

There are a few things to keep in mind:

Features / Flags will introduce complexity to the code and make it harder to test. You can only test in production for code that is guarded with production env.

We want to make sure the wasm code is as small as possible and therefore need to make sure the compiler is able to exclude dev code in prod build.

muharem commented 4 months ago

Should really pallet have different logic depending on argument like Env { Parachain, Solochain }. If it needs to enable something for example, it can receive an argument like EnableSomething: bool. I think this way it's more general and has more applications.

Also if we decide to introduce envs like development, production, etc, we should map them well with features like runtime-benchmarks, try-runtime, macros like prod_or_fast, pallet(dev_mode), cfg(test), etc, and have a guideline for it, where what should be used, otherwise it may complicate things further.

paritytech / polkadot-sdk