rrybarczyk commented 2 months ago

Background

This task is an outcome of the protocols Rust docs issues tracked in #845.

While documenting protocols::v2::codec-sv2 in #1012, areas of potential code debt were identified. This issue servers as a place to list out these items to then be addressed in an organized manner. The initial Rust documentation effort was an immediate action, while a refactoring (which implies breaking API changes) is not so urgent priority, so for now we should leave this in the backlog for an appropriate moment in the future.

Note: This issue will replace the #873 discussion.

Identified Potential Code Debt

Error Variant Naming

Suggestions for codec_sv2::error::Error and ::CError refactor:

Instead of importing at the top of the file:
- Use noise_sv2 crate directly in the Error variants like AeadError(noise_sv2::AeadError) and NoiseSv2Error(noise_sv2::Error).
- Use framing_sv2 crate directly in the Error variant like FramingSv2Error(framing_sv2::Error).
There are two definitions of the framing_sv2::Error variant: FramingError and FramingSv2Error, remove one of these variant and handle any downstream effects.
Rename (applies to both the codec_sv2::Error and codec_sv2::CError enums):
- AeadError -> Either keep as is or do:NoiseSv2Aead or NoiseAead
- BinarySv2Error -> BinarySv2or Binary
- FramingSv2Error/FramingError -> FramingSv2 or Framing
- NoiseSv2Error -> Noise
Alphabetize the error variants, the match arms of the implementations for the Error enums, and the order of the impl From<external-error> for Error

Uniform Spelling

We are using both HandShake and Handshake. Everything should be Handshake with a lowercase "s" (this is how it is defined in The Noise Protocol Framework) . This impacts:
- codec_sv2:::
- lib::State::HandShake -> lib::State::Handshake
- error::Error::NotInHandShake -> error::Error::NotInHandshake
- error::CError::NotInHandShake -> error::CError::NotInHandshake
- framing_sv2::framing2:: (there may be more in framing_sv2, but these are what I have noticed so far):
- HandShakeFrame -> HandshakeFrame
- EitherFrame::HandShake -> EitherFrame::Handshake
We have decoder::WithNoise::decode_noise_frame and then encoder::NoiseEncoder::encode. Would it make sense to make these methods uniform like decoder::WithNoise::decode and encoder::NoiseEncoder::encode?
We have decoder::WithNoise and decoder::WithoutNoise. Then we have encoder::NoiseEncoder and encoder::Encoder. Should we make it decoder::WithNoise, decoder::WithoutNoise, encoder::WithNoise, and encoder::WithoutNoise? Or decoder::NoiseDecoder, decoder::Decoder, encoder::NoiseEncoder and encoder::Encoder?

Imports

Small thing, but in codec_sv2::lib, I am not a big fan of importing framing_sv2::framing2::handshake_message_to_frame as h2f. I think we should import framing_sv2::framing2 then use framing2::handshake_message_to_frame directly. The upside is it is clearer to the reader, the downside is it is more verbose.
use core::cmp is imported in codec_sv2::error with the #[cfg(test)] declarator, but when running cargo test I get the following warning saying it is not used. Should it be removed?
```
warning: unused import: `core::cmp`
--> v2/codec-sv2/src/error.rs:8:5
|
8 | use core::cmp;
|     ^^^^^^^^^
```
In codec_sv2::decoder, some imports can be combined into one line. For example, use binary_sv2::{GetSize, Serialize}.
In codec_sv2::encoder and ::decoder, if the with_buffer_pool feature flag is disabled, we default Buffer to be buffer_sv2::BufferFromSystemMemory which is imposed in the import statement as use buffer_sv2::{Buffer as IsBuffer, BufferFromSystemMemory as Buffer};. To make this more readable, we should explicitly create the type in the same way that we do when the with_buffer_pool feature flag is enabled:
```
#[cfg(not(feature = "with_buffer_pool"))]
type Buffer = BufferFromSystemMemory;

#[cfg(feature = "with_buffer_pool")]
type Buffer = BufferPool<BufferFromSystemMemory>;
```

Exports

Make mod decoder and mod encoder -> pub mod decoder and pub mod encoder in src/lib so top level module Rust doc comments in these modules render in rust docs.
Do we reallly need to make the following a public export in lib.rs, they are just re-exposing primitives from other crates:

Use of `Result` types

The following methods use core::result::Result but should use codec_sv2::Result (unless there is a reason I am not seeing that requires using core::result::Result):

State::step_0
State::step_1
State::step_2

Uniform Struct Fields

decoder::WithNoise and encoder::WithoutNoise have some shared fields, these fields should appear first and be listed in the same order in each struct.

Misc

Should hard coded default buffer sizes be defined and constants?
error module imports use core::cmp under the #[cfg(test)] but it is unused.
In the decoder.rs module the types StandardEitherFrame and StandardSv2Frame are defined, however these frames can represent an encoded OR decoded Sv2 frame. I think we should move them out of the decoder module into a more appropriate place.
In the encoder.rs module, we have type Slice = buffer_sv2::Slice; with the #[cfg(feature = "with_buffer_pool")] decorator. Why do we need this type if it is simply assigned to buffer_sv2::Slice? Could we just directly use buffer_sv2::Slice?
In encoder.rs, we have the type Slice = Vec<u8> commented out. Can this be completely removed?
new methods should always come first (exception is if default is present, which it is not for this crate) in an impl. See impl<T: Serialize + GetSize> Encoder<T> in encoder.
Consider creating a struct type to hold the message, message type, extension type, and channel message required to create a new StandardSv2Frame/StandardEitherFrame. Right now all being passed into the from_message method. Could be fine, but if there are other instances where we see the need to use all these variables together, its own struct may be useful.
State logic exists in lib.rs. It was not immediately apparent to me to look there when trying to locate Starte related code. Consider moving this logic to its own state.rs or handshake_state.rs module.
State has two different impl blocks that can be combined into a single impl block.

plebhash commented 2 months ago

we should seriously consider merging codec_sv2 + framing_sv2 crates

https://github.com/stratum-mining/stratum/discussions/873#discussioncomment-10660503

perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

Fi3 commented 2 months ago

we should seriously consider merging codec_sv2 + framing_sv2 crates

#873 (comment)

perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

rrybarczyk commented 2 months ago

we should seriously consider merging codec_sv2 + framing_sv2 crates #873 (comment) perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

These crates are closely linked. To my understanding, you never use one with out the other. For this reason, we should combine them. Currently with them separate, you constantly have to cross reference between these two crates.

Unless there is a reason to not combine these two crates, we should plan on this refactor. @fi3, is there any reason not to combine that crates that you know of?

Fi3 commented 2 months ago

we should seriously consider merging codec_sv2 + framing_sv2 crates #873 (comment) perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

These crates are closely linked. To my understanding, you never use one with out the other. For this reason, we should combine them. Currently with them separate, you constantly have to cross reference between these two crates.

Unless there is a reason to not combine these two crates, we should plan on this refactor. @Fi3, is there any reason not to combine that crates that you know of?

I just don't see an advantage that justify the work of putting them together. What is this advantage? Why we should spend time doing that?

rrybarczyk commented 2 months ago

we should seriously consider merging codec_sv2 + framing_sv2 crates #873 (comment) perhaps also binary_sv2, although I still need to spend more time on that one to form a better opinion

why?

These crates are closely linked. To my understanding, you never use one with out the other. For this reason, we should combine them. Currently with them separate, you constantly have to cross reference between these two crates. Unless there is a reason to not combine these two crates, we should plan on this refactor. @Fi3, is there any reason not to combine that crates that you know of?

I just don't see an advantage that justify the work of putting them together. What is this advantage? Why we should spend time doing that?

Now is the time where we are really scrutinizing these crates to solidify them moving forward. Can you help me understand by giving me an example of when one of these crates would be used without the other?

plebhash commented 2 months ago

We need to make sure the low-level APIs are lean and easy to use. Right now, they are extremely convoluted. There's no way to understand how to use one single crate without looking for references in multiple other crates.

Maybe this is a blindspot for you @Fi3 , because the APIs are fresh in your head and you can navigate them easily (since you wrote most of them).

But for us, even while just documenting these crates, it has been extremely challenging to understand how everything is meant to be used together.

Our mission here is to make sure the codebase can be easily used by the entire ecosystem, not just one single pool.

Fi3 commented 2 months ago

IMHO is lot easier to make a crate that export everything and is well documented rather then refactor all the crates

Fi3 commented 2 months ago

https://github.com/demand-open-source/share-accounting-ext/blob/master/Cargo.toml#L15C22-L15C23

here for example I need framing but not codec

Fi3 commented 2 months ago

also low level crates are not supposed to be used but we should use an higher level library designed to be easy to use ecc ecc. The low level crates should be used only in special occasion where you really need them, for example for ffi

rrybarczyk commented 2 months ago

Ok, I will take a look at that extension and dig deeper into the code. Thank you.

Fi3 commented 2 months ago

ext is just an example main point is

IMHO is lot easier to make a crate that export everything and is well documented rather then refactor all the crates

jbesraa commented 2 months ago

I dont have a good input about merging/not merging. I will only share that as a new dev in the team, whenever I saw a crate under protocols folder, I immediately thought that there is a protocol specification for the crate, living in sv2-specs. Now I know this is not the case, but I do think it is miss-leading.

Fi3 commented 2 months ago

having several low level crates have the main advantage of keeping the lib as small as possible and easier to test and to review. framing-sv2 is very small and I agree that could be merged, I would put it in binary-sv2 not in codec. I also think that is more than fine like that, and that we shouldn't waste our time and energy in merging it.

plebhash commented 2 months ago

I can definitely entertain @Fi3's argument about effort. Whenever we start planning the scope how we will mitigate technical debt, I fully agree that it is very important for us to evaluate the trade-offs of all proposed changes.

And for sure, there will absolutely be cases where the benefits will not justify the efforts.

I'm not saying this will be the case here. I don't have a clear perspective around that yet, especially since we still have not fully studied the entire scope of protocols.

To be honest, this discussion is still somewhat premature (which is my fault, since I brought up this topic). After we cover 100% protocols crates, we will be in a much better position to have these debates.

What I said about potentially merging framing_sv2 + codec_sv2 was just a random thought that occurred to me while I reviewed https://github.com/stratum-mining/stratum/pull/1040.

It's actually quite interesting that @Fi3 said that it would probably be better to merge framing_sv2 into binary_sv2 instead of codec_sv2. I will tag @Shourya742 here so he is aware, but with a careful warning that his mindset while reading this should be to simply to entertain this possibility while he documents and reviews binary_sv2/no-serde.

For now, at least in my mind, all options are on the table. The main point is that it is very important that we do our best to learn protocols crates and really become experts at them, just like @Fi3 is. And maybe at the end of that journey, we could even come to the conclusion that everything is fine the way it is.

Or maybe not. Time will tell. 🧘

plebhash commented 1 month ago

on the top issue description @rrybarczyk suggested:

Uniform Spelling

...

We have decoder::WithNoise and decoder::WithoutNoise. Then we have encoder::NoiseEncoder and encoder::Encoder. Should we make it decoder::WithNoise, decoder::WithoutNoise, encoder::WithNoise, and encoder::WithoutNoise? Or decoder::NoiseDecoder, decoder::Decoder, encoder::NoiseEncoder and encoder::Encoder?

I would suggest the following naming:

encoder.rs:


#[cfg(feature = "noise_sv2")]
pub type Encoder<T> = EncryptedEncoder<T>;

[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear

pub type Encoder = PlainEncoder;

pub struct EncryptedEncoder { noise_buffer: Buffer, sv2_buffer: Buffer, frame: PhantomData, }

pub struct PlainEncoder { buffer: Vec, frame: PhantomData, }


- `decoder.rs`:
```rust
#[cfg(feature = "noise_sv2")]
pub type Decoder<B, T> = EncryptedDecoder<B, T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Decoder<B, T> = PlainDecoder<B, T>;

pub struct EncryptedDecoder<B, T> {
    frame: PhantomData<T>,
    missing_noise_b: usize,
    noise_buffer: B,
    sv2_buffer: B,
}

struct PlainDecoder<B, T> {
    frame: PhantomData<T>,
    missing_b: usize,
    buffer: B,
}

It's not necessarily the final code organization, just some pseudo-rust to convey the idea around the keywords:

Encrypted vs Plain (which depends on noise_sv2 feature flag)
Encoder vs Decoder

rrybarczyk commented 1 month ago

on the top issue description @rrybarczyk suggested:

Uniform Spelling

...

We have decoder::WithNoise and decoder::WithoutNoise. Then we have encoder::NoiseEncoder and encoder::Encoder. Should we make it decoder::WithNoise, decoder::WithoutNoise, encoder::WithNoise, and encoder::WithoutNoise? Or decoder::NoiseDecoder, decoder::Decoder, encoder::NoiseEncoder and encoder::Encoder?

I would suggest the following naming:

encoder.rs:
#[cfg(feature = "noise_sv2")]
pub type Encoder<T> = EncryptedEncoder<T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Encoder<T> = PlainEncoder<T>;

pub struct EncryptedEncoder<T> {
    noise_buffer: Buffer,
    sv2_buffer: Buffer,
    frame: PhantomData<T>,
}

pub struct PlainEncoder<T> {
    buffer: Vec<u8>,
    frame: PhantomData<T>,
}
decoder.rs:
#[cfg(feature = "noise_sv2")]
pub type Decoder<B, T> = EncryptedDecoder<B, T>;

#[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear
pub type Decoder<B, T> = PlainDecoder<B, T>;

pub struct EncryptedDecoder<B, T> {
    frame: PhantomData<T>,
    missing_noise_b: usize,
    noise_buffer: B,
    sv2_buffer: B,
}

struct PlainDecoder<B, T> {
    frame: PhantomData<T>,
    missing_b: usize,
    buffer: B,
}
It's not necessarily the final code organization, just some pseudo-rust to convey the idea around the keywords:

Encrypted vs Plain (which depends on noise_sv2 feature flag)

Encoder vs Decoder

I am not sold on Plain. We have been using the word "standard" to express unencrypted logic (correct me if I am wrong). So for the unencrypted, I prefer StandardEncoder/StandardDecoder, or just simply Encoder/Decoder.

For the encrypted, we I do like EncryptedEncoder/EncryptedDecoder.

rrybarczyk commented 1 month ago

Why do we have WithoutNoise::default and WithNoise::default if they just call Self::new()?
When should State::not_initialized be used?

plebhash commented 1 month ago

We have been using the word "standard" to express unencrypted logic (correct me if I am wrong).

That's not the case. The Standard* aliases are related to Buffer. There's even a StandardNoiseDecoder, which as the name implies, is encrypted. I asked some questions related to these aliases here https://github.com/stratum-mining/stratum/discussions/873#discussioncomment-10663045

tbh I'm not sold on these aliases... maybe we can do something better, but that's a different discussion (hopefully after buffer_sv2 has been properly documented).

Anyways, if Plain* is not good, we can do either Encoder / Decoder or UnencryptedEncoder / UnencryptedDecoder.

rrybarczyk commented 1 month ago

We have been using the word "standard" to express unencrypted logic (correct me if I am wrong).

That's not the case. The Standard* aliases are related to Buffer. There's even a StandardNoiseDecoder, which as the name implies, is encrypted. I asked some questions related to these aliases here #873 (comment)

tbh I'm not sold on these aliases... maybe we can do something better, but that's a different discussion (hopefully after buffer_sv2 has been properly documented).

Anyways, if Plain* is not good, we can do either Encoder / Decoder or UnencryptedEncoder / UnencryptedDecoder.

Ok, lets do that: Encoder/Decoder and EncryptedEncoder/EncryptedDecoder.

jbesraa commented 1 month ago

Wouldn't NoiseEncoder and NoiseDecoder be more consistent with the current naming in the projects?

rrybarczyk commented 1 month ago

Wouldn't NoiseEncoder and NoiseDecoder be more consistent with the current naming in the projects?

I do like that. Thoughts @plebhash?

Shourya742 commented 1 month ago

https://github.com/stratum-mining/stratum/pull/1040#discussion_r1798748428

rrybarczyk commented 1 month ago

When running the unencrypted and encrypted examples, the following warnings occur:

cargo run --example unencrypted
warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:31:11
   |
31 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration
   = note: `#[warn(unexpected_cfgs)]` on by default

warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:43:11
   |
43 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration

warning: trait `Variable` is never used
  --> v2/binary-sv2/no-serde-sv2/codec/src/codec/mod.rs:37:11
   |
37 | pub trait Variable {
   |           ^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: trait `IntoOwned` is never used
 --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:7:7
  |
7 | trait IntoOwned {
  |       ^^^^^^^^^

warning: `binary_codec_sv2` (lib) generated 4 warnings
warning: `binary_codec_sv2` (lib) generated 4 warnings (4 duplicates)
   Compiling codec_sv2 v1.2.1 (/Users/rachelrybarczyk/Development/StratumV2/Dev/stratum/protocols/v2/codec-sv2)
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.61s

cargo run --example encrypted --features noise_sv2
warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:31:11
   |
31 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration
   = note: `#[warn(unexpected_cfgs)]` on by default

warning: unexpected `cfg` condition value: `with_serde`
  --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:43:11
   |
43 | #[cfg(not(feature = "with_serde"))]
   |           ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: expected values for `feature` are: `buffer_sv2`, `default`, `no_std`, `prop_test`, `quickcheck`, and `with_buffer_pool`
   = help: consider adding `with_serde` as a feature in `Cargo.toml`
   = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration

warning: trait `Variable` is never used
  --> v2/binary-sv2/no-serde-sv2/codec/src/codec/mod.rs:37:11
   |
37 | pub trait Variable {
   |           ^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: trait `IntoOwned` is never used
 --> v2/binary-sv2/no-serde-sv2/codec/src/datatypes/non_copy_data_types/mod.rs:7:7
  |
7 | trait IntoOwned {
  |       ^^^^^^^^^

warning: `binary_codec_sv2` (lib) generated 4 warnings
warning: `binary_codec_sv2` (lib) generated 4 warnings (4 duplicates)
warning: methods `into_aesg` and `into_chacha` are never used
  --> v2/noise-sv2/src/cipher_state.rs:26:8
   |
7  | pub trait CipherState<Cipher_: AeadCipher>
   |           ----------- methods in this trait
...
26 |     fn into_aesg(mut self) -> Option<Cipher<Aes256Gcm>> {
   |        ^^^^^^^^^
...
33 |     fn into_chacha(mut self) -> Option<Cipher<ChaCha20Poly1305>> {
   |        ^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: associated items `name`, `hkdf_3`, and `ecdh` are never used
   --> v2/noise-sv2/src/handshake.rs:10:8
    |
9   | pub trait HandshakeOp<Cipher: AeadCipher>: CipherState<Cipher> {
    |           ----------- associated items in this trait
10  |     fn name(&self) -> String;
    |        ^^^^
...
69  |     fn hkdf_3(
    |        ^^^^^^
...
109 |     fn ecdh(private: &[u8], public: &[u8]) -> [u8; 32] {
    |        ^^^^

warning: `noise_sv2` (lib) generated 2 warnings
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.03s

stratum-mining / stratum

Refactor: `codec-sv2` crate #1129

Background

Identified Potential Code Debt

Error Variant Naming

Uniform Spelling

Imports

Exports

Use of `Result` types

Uniform Struct Fields

Misc

Uniform Spelling

[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear

Uniform Spelling

stratum-mining / stratum

Refactor: `codec-sv2` crate #1129

Background

Identified Potential Code Debt

Error Variant Naming

Uniform Spelling

Imports

Exports

Use of Result types

Uniform Struct Fields

Misc

Uniform Spelling

[cfg(not(feature = "noise_sv2"))] // redundant, but written to make it explicitly clear

Uniform Spelling

Use of `Result` types