livepeer / go-livepeer

Official Go implementation of the Livepeer protocol
http://livepeer.org
MIT License
538 stars 169 forks source link

E2E tests #2297

Open leszko opened 2 years ago

leszko commented 2 years ago

Abstract

Add automated E2E/Integration tests that cover on-chain interactions.

Motivation

Currently, we don't test Livepeer node software with the chain in any automated way. That approach results in:

Our current test suite covers unit tests and video-related integration tests, but we use mocks for all on-chain interactions.

Proposed Solution

I suggest creating a separate suite of tests in Golang, which uses the testcontainer library. Then, implement white-box tests covering user-facing scenarios.

Here's a PoC code skeleton how it would look like in practice.

1. Separate suite of tests

The E2E/Integration tests need to have a connection to the chain, so they'll have specific environment requirements, that's why I suggest to create:

2. Testcontainer library

As for the chain I think we should use Livepeer Geth docker image, the same what we currently use for the local dev testing. To integrate it well with Golang tests, we can make use of the testcontainer library, which enables running integration tests in the exactly same way we run unit tests (using go test or directly from IDE). The only requirement for the tests is to have Docker running.

3. White-box testing

There are two ways we can approach creating tests: black-box testing and white-box testing.

In our case, we can imagine a sample black-box test as the following script:

# 1. Start local Geth
docker run livepeer/geth-with-livepeer-protocol:streamflow
# 2. Start Orchestrator
livepeer -network=devenv -ethUrl http://localhost:8545/ -orchestrator
# 3. Register Orchestrator
curl http://localhost:7936/activateOrchestrator
# 4. Start Broadcaster
livepeer -network=devenv -ethUrl http://localhost:8545/ -broadcaster
# 5. Fund Deposit/Reserve
curl http://localhost:7937/fundDepositAndReserve
# ...

While this script looks simple and it's really testing exactly what a user does, I believe this is not an approach we should take, because of the following reasons:

Therefore, I propose to use white-box testing, but at the same time try to be as close to the black-box testing as possible. Sample white-box testing approach is presented in the PoC code skeleton and looks as follows.

func TestCompleteStreamingWorkflow(t *testing.T) {
    gethC := setupGeth(context.TODO(), t)
    defer gethC.Terminate(context.TODO())
    ethClient := newEthClient(gethC.URI, t)

    orch := startOrchestrator(ethClient)
    registerOrchestrator(orch)
    bcast := startBroadcaster(ethClient)
    fundDepositAndReserve(bcast)
}

func startOrchestrator(ethClient eth.LivepeerEthClient) *server.LivepeerServer {
    n, _ := core.NewLivepeerNode(ethClient, "./tmp", nil)
    s, _ := server.NewLivepeerServer("127.0.0.1:1938", n, true, "")
    // ...
    return s
}

4. User-facing scenarios

I think we should focus on having user-facing E2E scenarios as opposed to just checking if all the eth client functions work correctly. The reason for this approach is that most bug we discovered wasn't related to single on-chain interactions.

Here are the scenario I think we should cover first.

Scenario 1: Full B<>O workflow

  1. Start O
  2. Register O
  3. Start B
  4. Fund Deposit/Reserve
  5. Start a stream
  6. Check if the stream was transcoded
  7. Check that a ticket was received by O
  8. Redeem the ticket
  9. Check ETH balance

Other scenarios to consider

Implementation Tasks and Considerations

  1. Prepare geth-with-livepeer-protocol Docker image
    • Update to the confluence version
    • Prepare builds for arm64/v8
    • (optional) Integrate GH Actions in livepeer/protocol to automatically build the Docker image
  2. Implement Scenario 1
    • Create reusable code with testcontainer
    • Decide on the test structure (one scenario per file or all scenarios in one file)
    • Refactor livepeer.go to simplify running it from tests
    • Write code to cover Scenario 1
  3. Create automation
    • Create test_e2e.sh script
    • Create GH Action
  4. Implement other scenarios

Testing Tasks and Considerations

We need to make sure that:

Known Unknowns

There are a few things to consider during the implementation.

1. Mocking video transcoding

We can consider mocking the video transcoding part since it is resource-consuming. Nevertheless, I suggest to first try the real transcoding and optimise only when needed.

2. Scope of test scenarios

Initially, I plan to have complete user-facing testing scenarios, but if we find the scope of them too big, we can try splitting them into smaller ones. E.g. test only O registration instead of the full B<>O workflow.

3. No Arb Node Docker image

Currently, we only have Geth Docker image, but no Arb Node Docker image. The networks are compatible for most scenarios, but we may consider building Arb Node Docker image at some point.

Alternatives

1. Separate suite of tests:

Instead of creating a separate suite of tests, we could add the tests in the existing folders or/and run them with test.sh.

2. Testcontainer library

Instead of using the local Geth chain, we could run tests against Arbitrum Rinkeby, but I'm afraid we'll see a lot of flakiness happening due to:

3. Golang tests

Instead of integrating tests into Golang, we could do one of the following:

4. User-facing scenarios

Instead of creating the complete user-facing E2E scenarios, we could try to target just part of interactions, e.g. only O registration. Or only ticket redemption. I think we may need to change it into such format at some point, however I'd start from the full user-facing scenarios, because that's actually how we currently test it manually. Later, if we suffer from having too many scenarios, we can consider splitting it into smaller parts.

Additional Context

N/A

yondonfu commented 2 years ago

Overall this looks good to me and +1 for writing the tests in Go. The testcontainer package looks like it will be quite useful for this.

For Scenario 1, for simulating a stream, I recommend using HTTP ingest for B and not RTMP ingest because the latter will likely be deprecated soon. And if we're using HTTP ingest, for the purposes of simulating a stream we could just push a single segment to the API to start with which should simplify the test since triggering the full e2e workflow should just begin with submitting that single segment to the B's HTTP API.

As for the chain I think we should use Livepeer Geth docker image

At this point, there are alternative ETH blockchain clients that might be more dev friendly such as Hardhat node, but since we have some geth specific logic in devtool and since geth should work fine as long as the image is updated I'm onboard with sticking with geth for now and considering other options separately if we need to.

Update to the confluence version

Note that the current build script for setting up the image with pre-deployed contracts is pinned to an older pre-Confluence commit for the protocol repo that was still using the Truffle dev framework for deploying contracts. The current confluence branch of the protocol repo uses the Hardhat dev framework. Additionally, deploying the contracts won't work right out of the box because right now you have to deploy contracts from the protocol repo and the arbitrum-lpt-bridge repo because the protocol contracts depend on a L2LPTDataCache contract that is defined in the arbitrum-lpt-bridge. We should have a task for streamlining the contract deployment process so it is easier to update the image build script cc @kautukkundan @RiccardoBiosas for awareness and for help here when we define the task.

Refactor livepeer.go to simplify running it from tests

Interested to see how you're thinking about approaching this refactor. I'm guessing the end goal is to be able to programatically instantiate a B or O in Go s.t. the respective objects can be passed around and used in tests and the refactored livepeer.go would just instantiate each of these objects?

leszko commented 2 years ago

Thanks for the feedback @yondonfu

For Scenario 1, for simulating a stream, I recommend using HTTP ingest for B and not RTMP ingest because the latter will likely be deprecated soon. And if we're using HTTP ingest, for the purposes of simulating a stream we could just push a single segment to the API to start with which should simplify the test since triggering the full e2e workflow should just begin with submitting that single segment to the B's HTTP API.

Yes, it makes sense.

Update to the confluence version

Note that the current build script for setting up the image with pre-deployed contracts is pinned to an older pre-Confluence commit for the protocol repo that was still using the Truffle dev framework for deploying contracts. The current confluence branch of the protocol repo uses the Hardhat dev framework. Additionally, deploying the contracts won't work right out of the box because right now you have to deploy contracts from the protocol repo and the arbitrum-lpt-bridge repo because the protocol contracts depend on a L2LPTDataCache contract that is defined in the arbitrum-lpt-bridge. We should have a task for streamlining the contract deployment process so it is easier to update the image build script cc @kautukkundan @RiccardoBiosas for awareness and for help here when we define the task.

Ok, I see, it's more difficult than I thought! I'll ask for @kautukkundan and @RiccardoBiosas for the help here.

Refactor livepeer.go to simplify running it from tests

Interested to see how you're thinking about approaching this refactor. I'm guessing the end goal is to be able to programatically instantiate a B or O in Go s.t. the respective objects can be passed around and used in tests and the refactored livepeer.go would just instantiate each of these objects?

Yes, I though about having a struct like LivepeerOrchestrator which would encapsulate all that we currently pass as flags. Then, in the tests you'd just initialize this struct, and in the livepeer.go you pass flags to this struct. But TBH I need to dig deeper to check how difficult it would be.

RiccardoBiosas commented 2 years ago

Sounds like a great plan!

but since we have some geth specific logic in devtool and since geth should work fine as long as the image is updated I'm onboard with sticking with geth for now and considering other options separately if we need to.

@yondonfu out of curiosity, what breaking changes would be caused by replacing geth with one of the available alternatives? Also @leszko is there any reason why you'd prefer to deploy the protocol from scratch for the purpose of testing rather than forking livepeer on L2 against a pinned block and running all the pertinent scenarios there? If the purpose is to achieve a more realistic context for the e2e tests I think it'd be a better course of action. I was skimming through the arbitrum main repo and it seems they have recently implemented a node utility to fork the chain state

Additionally, deploying the contracts won't work right out of the box because right now you have to deploy contracts from the protocol repo and the arbitrum-lpt-bridge repo because the protocol contracts depend on a L2LPTDataCache contract that is defined in the arbitrum-lpt-bridge.

Both the livepeer protocol repo and arbitrum-lpt-bridge use hardhat-deploy which allows to access a a L1 and L2 network in the same context via companion networks. A possible approach could be to deploy the respective contracts on a dockerized geth network and arbitrum network and then package everything in a docker-compose. I think Optimism was doing something similar in the past.

B<>O failover
O<>T failover
Separate Redeemer process
Network issues
Redeemer unavailable
ETH URL unavailable / rate limited
Fast verification and suspension mechanism

Interesting! Curious how you're planning to implement the failover/resiliency tests. Do you have any framework/library in mind? I remember reading how Netflix tests the resiliency of their application lifecycle by causing failures in their components, here and here. Not sure if this is somewhat close to what you have in mind tho

yondonfu commented 2 years ago

@RiccardoBiosas

out of curiosity, what breaking changes would be caused by replacing geth with one of the available alternatives?

I think I actually may be wrong about there being breaking changes if we replaced geth...I thought that the use of geth's JS console feature in the devtool script in order to execute web3.js scripts would break if we replaced geth, but it actually looks like we use geth packages to both dial the RPC endpoint and to instantiate a console object that accepts JS scripts in Go which might work with other RPC providers because I'm guessing that geth's console package will just translate the JS scripts into RPC requests that are sent to the provider. I haven't tested any of this though!

is there any reason why you'd prefer to deploy the protocol from scratch for the purpose of testing rather than forking livepeer on L2 against a pinned block and running all the pertinent scenarios there?

I like the idea of using a client forked from a live network! I think this would mean that we'd need to use a forking capable client like Hardhat node (AFAIK geth doesn't support this) though. We'd still be able to do a clean deploy of the contracts if we wanted to, but by default we could just use the contracts that are already deployed on the live network instead of having to pre-deploy the contracts in the Docker image. Additionally, if we are ever preparing for a contract upgrade we could execute the upgrade on the fork and then run go-livepeer e2e tests against that upgraded local fork.

The main downsides I see of using a client forked from a live network:

If we think the effort to add the automation to address the second point is not substantial then I think using Hardhat node to fork could be worth trying out.

I was skimming through the arbitrum main repo and https://github.com/OffchainLabs/arbitrum/pull/2063

Nice find. This looks like it could be useful. However, it seems that it requires access to an Arbitrum rollup node which requires access to a L1 node which increases devops work/burden. Given that the tool was just created, I lean towards checking out the tool, but holding off on incorporating it into our setups until later on when we have more confidence in how to use it properly as well as how to properly setup a local Arbitrum rollup environment.

A possible approach could be to deploy the respective contracts on a dockerized geth network and arbitrum network and then package everything in a docker-compose.

Using docker-compose to package together all the dependencies for standing up a local L1/L2 network sounds like the right eventual path. However, given the above points, I think it could be worthwhile to get e2e testing working without an actual L1/L2 network first.

leszko commented 2 years ago

Interesting! Curious how you're planning to implement the failover/resiliency tests. Do you have any framework/library in mind? I remember reading how Netflix tests the resiliency of their application lifecycle by causing failures in their components, here and here. Not sure if this is somewhat close to what you have in mind tho

I didn't have any chaos testing or anything like that in mind. Rather a simple scenario:

leszko commented 2 years ago

That's a very interesting idea with forking the mainnet!

At first it seemed like a great idea for me, but the more I think about it, the less tempting it seems to me. The main benefit is clear that we're as close as possible to the prod env. In terms of the amount of work, I'm not sure if it's simpler to fork or prepare our own Docker image with contracts. However, I see some drawbacks and issues we'd need to address. Here are some thoughts. Keep in mind that I may be wrong in some points if I misunderstood how chain forking works.

1. Using external service in automated tests We already see a lot of rate limiting in the prod system, so we may encounter the same in the automated tests. What's worse, tests are more "flaky" if the connection is down or RPC URL is down. Also the time needed to execute the tests increases. The dev experience also won't be great, because instead of just executing ./test_e2e.sh, you'll have to create an account in Infura/Alchemy, set up the link, etc. And not all contributors to go-livepeer are interested in blockchain at all.

2. Network state I think we don't want to have the all the state in our local test environment. For example, it'd be better to start the system with no registered orchestrators and register them on our own (like we currently do in with our Geth Docker image). Otherwise, the E2E test will work more like what we currently do in the Rinkeby network. For example, if we test redeeming a ticket, we don't use B<>O discovery, but just statically set -orchAddr. And I guess we'd prefer to test B<>O discovery as well in E2E.

3. Integration with go test If it's all about starting one Docker image, we can integrate it well with the Golang tests. However, I believe forking will have some more requirements, e.g., Hardhat needs to be installed. We may probably hide it all in the Docker image, but it's an additional work (and maintenance).

Wdyt? @yondonfu @RiccardoBiosas

leszko commented 2 years ago

I've converted this spec into Epic and created GH Issues accordingly. You can take a look.

I know some of the points are still under discussion, so don't worry, if we make any decisions I'll update GH issues.

thomshutt commented 2 years ago

Looks like a great proposal and definitely something that'll also be useful as a living document of our core workflows for people new to the project etc.

My 2 (Euro) cents:

leszko commented 2 years ago

Thanks for the feedback @thomshutt. I'm happy that you like the proposal and excited we'll soon start working on the E2E Tests!

Concerning your points:

  1. Running tests in parallel - yes, it'd be great to run in parallel, but I think one test needs a clean chain, so that it's repeatable. That is why I think we should start from the single dev chain and execute tests sequentially and later try to parallelize tests by running multiple chains in parallel. It should be possible with testcontainers.
  2. BDD has some benefits (test scenarios easy to read), but it also comes with the cost (harder to check the test execution code, because you have a layer of Step Definitions / Fixtures). I suggest we at least try Cucumber and see if we like it. Created a GH Issue for it. Note that I'd like to do it after writing the first scenario in Golang, because I think we should focus on the most important part first, which is a running scenario.
  3. Black-box testing is the holy grail, so we would love to have it. On the other hand, there is some probablistic stuff happening and we don't really want to expose winProb as a user parameter (or move it to DB). Nevertheless, agree with the points you wrote. And I agree that black-box testing is better.