ton-society / grants-and-bounties

TON Foundation invites talent to imagine and realize projects that have the potential to integrate with the daily lives of users.
https://ton.org/grants
243 stars 103 forks source link

TON Static Analysis based on Symbolic Execution #489

Open korifey opened 3 months ago

korifey commented 3 months ago

Summary

Static analysis tool for TON offers following advantages:

Context

TON requires advanced tools for testing smart contracts so we'd like to implement the best possible bunch of tools on the same symbolic execution core. We want to start with Static Analysis but a single core will enormously reduce development costs for other tools like Property-based testing and Automated test generation in future.

From a business perspective, we believe we can create a unique selling point for TON. Our tool will not only find bugs and vulnerabilities but will also significantly automate smart contracts audit and reduce costs for users.

Scenarios

Static analyzer usually takes source/bytecode of a smart contract as input and provides analysis results as output verifying some rules. Contemporary static analysis tool provides output in SARIF format supported by many tools. There are several forms how user can run the tool:

Typically static analysis can have predefined and custom rules. Some predefined examples:

It's extremely important to provide flexible API to define user-defined checks for certain classes of analysis such as:

Technical advantages of the proposed approach

The core of analyzer is built upon a cutting-edge symbolic execution engine for TVM bytecode. It uses symbolic execution with SMT solving and taint analysis to find bugs, vulnerabilities, and automatically generate unit tests with mathematical precision.

Team

We are a team of professionals with a rich industry and academic background: 10+ years in developer tools, 7+ years in code analysis and symbolic execution. We’ve been delivering precise and scalable symbolic execution engines for C/C++, Java and .NET Core into industrial settings. We gained awards in acknowledged academic competitions:

Plan for milestone 1

Since we've already spent 1 month on R&D How to implement symbolic execution for TVM based on USVM, we need additional 6 weeks to deliver a demo of our static analysis capabilities.

Sprint 0 (already done)

Symbolic interpreter supporting bytecode instructions:

Sprint 1 (2 weeks)

Support bytecode instructions to verify some exiting real smart contracts:

Sprint 2 (2 weeks)

Try to interpret some real contracts. Support base checks for demo:

Sprint 3 (2 weeks)

Polish checks on real contracts, maybe support additional instructions. Provide CLI that takes a contract and returns check results in SARIF format. Provide documentation and report how it works.

References

Similar engines have been successfully applied for the analysis of Ethereum smart contracts on EVM bytecode level. Among them are mythril, manticore, and oyente.

Some of mentioned links above:

Estimate suggested reward

💰 9000 USD 🕑 6 weeks (3 sprints)

anton-trunov commented 3 months ago

This is an absolutely fantastic proposal based on state-of-the-art research! Compared to the one in #436 it works at the level of TVM which makes it suitable for any language that compiler down to TVM.

@korifey I'm wondering in case some bugs are found at the level of TVM instructions, how do you report the error to the programmer so they would understand where to look for it in the source code which is FunC or Tact at the moment?

howardpen9 commented 3 months ago

happy to see more dev tool guys! (also support in Tact plz)

korifey commented 3 months ago

This is an absolutely fantastic proposal based on state-of-the-art research! Compared to the one in #436 it works at the level of TVM which makes it suitable for any language that compiler down to TVM.

@korifey I'm wondering in case some bugs are found at the level of TVM instructions, how do you report the error to the programmer so they would understand where to look for it in the source code which is FunC or Tact at the moment?

Thanks for you good words.

How to properly render in source code errors found by bytecode analyzer? Usually high-level languages compilers remain some debug information (like line number) when compile source code into low level code. For example in Java: public equals(Ljava/lang/Object;)Z L0 LINENUMBER 149 L0 ALOAD 0 ALOAD 1 IF_ACMPNE L1 ICONST_1 GOTO L2 L1

So, if we get such kind of support from Tact/FuncC compiler, analyzer will return nice SARIF traces.

Another option is to generate test that lead to error. This is an ultimate power of symbolic execution - not only show error but create test (exploit) that demonstrate error behavior.

anton-trunov commented 3 months ago

Usually high-level languages compilers remain some debug information (like line number) when compile source code into low level code

This is a nice solution and is certainly doable.

I have one more concern, though. Symbolic execution can be quite slow, so it could be challenging to integrate analyzers based on it into some interactive dev tools like IDEs like Intellij IDEA or programmer's editors like VS Code. How do you solve this issue?

korifey commented 3 months ago

Usually high-level languages compilers remain some debug information (like line number) when compile source code into low level code

This is a nice solution and is certainly doable.

I have one more concern, though. Symbolic execution can be quite slow, so it could be challenging to integrate analyzers based on it into some interactive dev tools like IDEs like Intellij IDEA or programmer's editors like VS Code. How do you solve this issue?

In fact for traditional programming languages programs can be very large (millions lines of code) so symbolic execution alone is too slow. In this case we use primary filter of vulnerabilities by dataflow (IFDS) + point-to. Then symbolic execution comes to play, filtering out false positives of previous stage. But smart contract are relatively small programs and highly optimized summary-based symbolic execution seems like enough for small contracts.
Speaking about interactive mode (background analysis in IDE during user typing), I'd say it's not a target scenario for this bounty. Default way of running delivered analysis in IDE is "push-button" scenario when user explicitly say "analyze this" and wait until compilation (to get bytecode) and wait some time until precise analysis results.

I'm not sure compilation phase for Tact is fast enough for real interactive mode. Our combined dataflow & symbolic execution engine allows different tradeoffs between precision and speed. We could implement source code level analysis but it requires another bounty and more R&D.

korifey commented 3 months ago

Hi guys, what is your opinion about the proposal? Do you need more clarification or not?

Damtev commented 2 months ago

Hi @anton-trunov. Is there any progress on the remaining debug information (especially line numbers) during compilation?

anton-trunov commented 2 months ago

Hi @Damtev, no progress so far from our side, unfortunately. I opened an issue https://github.com/tact-lang/tact/issues/296, so you can subscribe to it and track progress. We have an upcoming Tact v1.3.0 release this week or next week tops. We'll try our best to get line number information in Tact v1.4.0

korifey commented 1 month ago

Hi, we finished first milestone and prepared some video and script for you to show capabilities of proposed approach.

Here is video report: https://youtu.be/0r5TbLAXF9o Here is open source repository with examples, README: https://github.com/explyt/ton-bounties/

P.S. We are ready to continue evolve our engine for TON static analysis, property based testing and beyond