Improving WASM Smart Contract Debugging and Code Coverage in near-workspaces-rs

njelich commented 1 year ago

Issue Description

I am seeking to improve the debugging, profiling, and code coverage capabilities for WebAssembly (WASM) smart contracts using the near-workspaces-rs library. This issue serves as a discussion point for potential approaches to implement these features.

One potential solution involves the use of minicov, a crate that provides code coverage and profile-guided optimization (PGO) for no_std and embedded programs such as WASM smart contracts. Minicov achieves this by utilizing a modified version of the LLVM profiling runtime, from which all dependencies on libc have been removed.

However, minicov relies on outputting profraw files for its functionality. By default, near-workspaces-rs and NEAR do not allow file writes as a feature, creating a barrier for using minicov in this context.

Proposed Solution

To overcome this barrier, I propose that near-workspaces-rs or NEAR should allow a special debugging mode that permits file writes or some other output channel where these profraw files can be piped. This would enable the profraw files to be processed with llvm-cov, thus allowing actual code coverage of WASM smart contracts on the NEAR runtime to be measured.

This is particularly important as the general coverage developers can currently get is not actually on the NEAR runtime, and this caveat might be missed by those developing near smart contracts unfamiliar with Rust. For example, if only running unit tests, the #[payable] attributes might be completely missed on some functions that are thought to be tested based on default coverage reporting, but are missed in the workspace-rs tests.

Discussion Points

I would like to open a broader discussion on this topic and outline other similar cases. Some potential discussion points could include:

Pros and cons of allowing file writes or other output channels in near-workspaces-rs or NEAR.
Other potential solutions for improving code coverage and debugging capabilities.
The impact of these changes on the overall performance and security of the system.

I look forward to hearing the community's thoughts and ideas on this matter.

michaelbajor commented 1 year ago

I do like the idea! Absolutely, having a code coverage would absolutely be desirable!

I believe that the way workspaces crate works is that it is launching a local node (sandbox), deploys a compiled wasm and then sends transactions via JSON RPC.

Thinking out loud here: a potential solution would be to measure "how much binary" was executed in each call and "what part" of it. Then, calculating a ratio of executed parts to all of the binary size would give at least an estimation on the code coverage.

But it seems like it should be implemented in the sandbox itself, not in the workspaces crate.

njelich commented 1 year ago

That is more or less how LLVM works - the thing is, WASM is an embedded environment, so normal rust tooling for LLVM doesn't work (it is a no profiler_builtins environment). This is where minicov comes in.

But because the environment is even more restricted than that (runs in VM with size limits and no file write) it's extra tricky.

frol commented 1 year ago

Thanks for bringing it up! This makes total sense to me. minicov seems to be exactly what is needed, I believe it should be added as a feature to near-sdk-rs, so it can add instrumentation as part of the macro expansion, and dump the profiling results via env::log_str. The question, however, would be the size of that profiling data, as there is a limit of how much logs can handle, though we can consider raising the limits for the local sandbox instance of nearcore.

@njelich Would you like to champion the implementation? I am happy to help you to find the relevant resources and integration points.

njelich commented 1 year ago

I'd be happy to. Right now I think it may be over the limit the logs can handle, thats why I thought adding a file write for debugging would be useful.

EDIT: I had a play around with setting it up - currently getting.

Error: unable to fulfill the query request

Caused by:
    handler error: [Function call returned an error: wasm execution failed with error: HostError(GasLimitExceeded)]

Which is purely due to the size of minicov - so a coverage run of near-workspace-rs would have to use neard with disabled gas measurement or with max_gas_burnt_view set to a high amount - which can be set when initializing a custom sandbox node.

frol commented 1 year ago

Right now I think it may be over the limit the logs can handle, thats why I thought adding a file write for debugging would be useful.

Actually, after a second thought, you can just write the data to the contract storage (i.e. env::storage_write("PROFILING_DATA", data)), and then view the storage after the call. Though, it won't work for view-function calls, but in order to test code coverage, view-functions can be called as function calls.

a coverage run of near-workspace-rs would have to use neard with disabled gas measurement or with max_gas_burnt_view set to a high amount

Exactly right 👍

njelich commented 1 year ago

Regarding running a custom near-sandbox - how would this be done in near-workspaces 0.7.0? Based on a read through of the code it seems impossible right now.

ghost commented 1 year ago

@njelich you can use NEAR_SANDBOX_BIN_PATH to set your own. ~Here's some other flags~ NEAR_ENABLE_SANDBOX_LOG=1 might also be helpful.

I advise for now to use nearcore branch 1.35.0 for your edits as I ran into an issue with the master branch building a custom sandbox here.

njelich commented 1 year ago

Alright, I've got the tests running on a custom network using near-workspaces 0.8.0 (waiting for release). I managed to make enough progress to get to a new error.

Error: unable to broadcast the transaction to the network

Caused by:
    handler error: [An error happened during transaction execution: InvalidNonce { tx_nonce: 8, ak_nonce: 8 }]

Maybe the parallel test execution is causing problems...

EDIT: Works - I've got my very first .profraw file.

njelich commented 1 year ago

Okay, I've made some progress but I'm having issues with data limits - right now I have a view function that returns the coverage data, and it seems to be truncating its output (interesting behavior - silently truncates). @frol @shariffdev

ghost commented 1 year ago

I'll take time to check the source to see if it's something that can be helped.

ghost commented 1 year ago

@njelich I did not run into truncated output for large outputs, if it got too large it was HostError(GasLimitExceeded) error. Can you provide sample code, I might be missing something.

njelich commented 1 year ago

Okay, I've got it figured out - there were some deserialization issues.

Next step on my side would be figuring out how to plug in coverage measurement after every function call - for now I have a view function that I use to get coverage - that doesn't work in practice, because it only has its own environment for measurement, but really it should be injected every time there is a function call or view, within the same VM execution process. What would be the best way to do that? @shariffdev @frol

ghost commented 1 year ago

Since logs/events are out, nothing comes to mind atm. Maybe a helper method? Thoughts @frol

njelich commented 1 year ago

It would probably require changes to nearcore in that case, to plug into the VM? Or maybe I can write a derive for coverage that adds the function call the end of each existing function call with some output channel.

frol commented 1 year ago

@njelich I believe that plugging in your code through near-sdk-rs derives is the best way to go.

Just for your reference, you can find what kind of code near-sdk-rs macro generates in the snapshots: https://github.com/near/near-sdk-rs/blob/ee5cf867741d6d0d4db15857609b9e9268cc9b32/near-sdk-macros/src/core_impl/code_generator/snapshots/no_args_no_return_mut.snap, so each struct method gets auxiliary pub extern "C" fn function that does state restore and save before and after the method call.

njelich commented 1 year ago

So I would inject a bit of code before the return in the method_wrapper code.

https://github.com/near/near-sdk-rs/blob/ee5cf867741d6d0d4db15857609b9e9268cc9b32/near-sdk-macros/src/core_impl/code_generator/impl_item_method_info.rs#L9

frol commented 1 year ago

@njelich Yep, that is the right place to look into.

njelich commented 12 months ago

I've noticed something interesting in the process of trying to modify the near-sdk. Editing the Cargo.toml near-sdk dependency to the revision corresponding to release 4.1.1 breaks builds.

near-sdk = { git = "https://github.com/near/near-sdk-rs", rev = "55020df" } While it works fine with

near-sdk = "4.1.1"

Any idea what might be happening there? I get errors around types:

error[E0308]: arguments to this method are incorrect
  --> contract/src/ft_receiver.rs:45:8
   |
45 |     fn ft_on_transfer(&mut self, sender_id: AccountId, amount: U128, msg: String) -> PromiseOrValue<U128> {
   |        ^^^^^^^^^^^^^^            ---------             ------ expected `U128`, found `near_sdk::json_types::U128`
   |                                  |
   |                                  expected `AccountId`, found `near_sdk::AccountId`

frol commented 12 months ago

@njelich This error is usually an indicator of that two versions of the same crate were pulled in. Try using patch section in Cargo toml file instead of editing the dependency path.

Also, keep in mind that near-sdk-rs master branch has some breaking changes and will be released as 5.0 soon

frol commented 10 months ago

@njelich Are you in touch with Hacken team?

Hacken folks recently released: wasmcov. The project provides WASM-based code coverage. They have some docs for use on near.

njelich-hacken commented 10 months ago

well

ghost commented 10 months ago

I'll update the README and link the docs. @njelich I found this link broken https://github.com/hknio/wasmcov

njelich-hacken commented 10 months ago

@shariffdev It was private.

njelich-hacken commented 10 months ago

@shariffdev It was private.

bokobza commented 4 months ago

Hi @frol, is the plan still to include code coverage in near-workspaces or is wasmcov the way to go?

frol commented 4 months ago

Hacken team received a grant to integrate code coverage solution, but I have not heard from them lately

njelich commented 4 months ago

I have left Hacken a while ago, but I believe @bbarwik is likely in charge of this task now.

bbarwik commented 4 months ago

Hey @frol @bokobza and everyone else. I almost finished this tool for near, it's in beta phase. The current beta version is available only on github - https://github.com/hknio/wasmcov/ - I'll push it to creates.io by the end of the week. Here's how you can use it right now:

To install use: cargo install wasmcov then you get a new command cargo wasmcov

To include it in your near project, add to dependencies: wasmcov = "0.2" and in lib.rs add:

#[cfg(target_family = "wasm")]
wasmcov::near::add_coverage!();

Then in your integration tests, if you need to build WASM binaries first run cargo wasmcov build -- <standard build options> it will build them for you and place in wasmcov/target dir. You need to have rust nightly installed for it

To run your tests just use cargo wasmcov test --near=1.35.0 it should work even if your tests have near_workspaces::compile_project("./")

Then to generate coverage report run cargo wasmcov report

You need to have rust nightly version installed and llvm 18, you can get it from https://apt.llvm.org/

Please let me know what you think and if it worked correctly in your case.

Here's an example report cross_contract.zip

near / near-workspaces-rs