near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.33k stars 628 forks source link

Break down large source code files into smaller ones #10275

Open pugachAG opened 11 months ago

pugachAG commented 11 months ago

Context

Related zulip thread. Large source code files are hard to work with and can be a sign of a poor abstractions. This was proven to affect productivity of a multiple engineers (@pugachAG, @Longarithm) working with chain as part of Stateless Validation project.

Addressing the issue

Heavily inspired by Leo's comment.

Make smaller abstractions

First try splitting underlying abstraction into a smaller ones. This could take quite some effort, but also helps the most with technical debt.

Move structs to separate files

Then, if there’s no way to make the abstraction better-contained, split at the struct boundary, so that at least the API exposed by the struct to other structs (which now are other modules) is explicit, and we avoid touching internal structures without noticing.

Move utility and testing-related functions to a separate module

If we can’t make progress on other stuff, split test-related functions to another file, as it won’t really help with code clarity otherwise.

If they are being used within a single crate, then that is simple and can be in a testing module. If they are being used by multiple crates, then we have a complication as maybe we need to have a testing only crate. Not a huge deal but not ideal.

Move tests to a separate file

It can be argued that this doesn't always improve the situation.

Pros:

Cons (see this comment):

So moving tests to a separate file should be considered on case-by-case basis.

Example: https://github.com/near/nearcore/pull/10274

Current state

Top offenders can be found with the following bash cmd:

find . -type f -name \*.rs -not -path "./target/*" -exec wc -l {} + | sort -rn | head -n 20

Current result: 8175 ./runtime/near-vm/compiler-singlepass/src/codegen_x64.rs 6069 ./chain/chain/src/chain.rs 3956 ./chain/chain/src/store.rs 3763 ./integration-tests/src/tests/client/process_blocks.rs 2978 ./core/primitives/src/views.rs 2968 ./chain/chunks/src/lib.rs 2955 ./runtime/near-vm-runner/src/logic/logic.rs 2839 ./nearcore/src/runtime/mod.rs #10274 2655 ./chain/epoch-manager/src/tests/mod.rs 2646 ./runtime/runtime/src/lib.rs 2639 ./chain/client/src/client.rs 1985 ./chain/client/src/client_actor.rs 1982 ./core/store/src/trie/mod.rs 1918 ./chain/epoch-manager/src/lib.rs 1893 ./runtime/runtime/src/verifier.rs 1848 ./tools/mirror/src/lib.rs 1775 ./runtime/runtime/src/actions.rs 1735 ./integration-tests/src/tests/client/resharding.rs 1733 ./nearcore/src/config.rs 1715 ./chain/network/src/peer/peer_actor.rs

akhi3030 commented 11 months ago

move tests to a separate module

This will also help with grep and related tools when I am trying to traverse the code base. E.g. when I am trying to figure out where a function is used in production, I can ignore hits in files whose name contain the string test.

akhi3030 commented 11 months ago

move testing-related functions to a separate module

+1 to this. If they are being used within a single crate, then that is simple and can be in a testing module. If they are being used by multiple crates, then we have a complication as maybe we need to have a testing only crate. Not a huge deal but not ideal.

move utility functions to a separate module

Same as above. We already have a stdx crate, we could maybe have a general utilities crate?

Ekleog-NEAR commented 11 months ago

I’d agree with your list of ideas, but the order of priority should IMO be the other way around:

(I think the reason for the mod tests {} pattern is that you’ll need to replace foo.rs with foo/{mod.rs,tests.rs}, and it’s just slightly less convenient to handle)

We already have a stdx crate, we could maybe have a general utilities crate?

I’d argue against this pretty strongly. This feels like a recipe for a new near-primitives, and we’re having enough trouble with one already. So long as it’s literally just a few functions it could go into stdx, but if not we should find a proper home for the testing functions, and not just dump them all into a generic crate without more thought :)

pugachAG commented 11 months ago

Thanks @Ekleog-NEAR and @akhi3030 for your feedback! I've updated the description to incorporate your suggestions.

shreyan-gupta commented 10 months ago

I've worked on moving parts of orphan block management code and garbage collection code into separate files. Efforts are a part of PRs

https://github.com/near/nearcore/pull/10347 https://github.com/near/nearcore/pull/10361

shreyan-gupta commented 9 months ago

Another one for breaking up stateless validation code into multiple modules/files.

https://github.com/near/nearcore/pull/10522