E2E tests - Githubissues

Abstract

Add automated E2E/Integration tests that cover on-chain interactions.

Motivation

Currently, we don't test Livepeer node software with the chain in any automated way. That approach results in:

A lot of time spent on manually testing each chain-related change
Bugs in the released software (e.g. #2283 or #2152)

Our current test suite covers unit tests and video-related integration tests, but we use mocks for all on-chain interactions.

Proposed Solution

I suggest creating a separate suite of tests in Golang, which uses the testcontainer library. Then, implement white-box tests covering user-facing scenarios.

Here's a PoC code skeleton how it would look like in practice.

1. Separate suite of tests

The E2E/Integration tests need to have a connection to the chain, so they'll have specific environment requirements, that's why I suggest to create:

separate test_e2e.sh script to run them
separate directory in go-livepeer called tests_e2e (or e2e)
separate GH Action

2. Testcontainer library

As for the chain I think we should use Livepeer Geth docker image, the same what we currently use for the local dev testing. To integrate it well with Golang tests, we can make use of the testcontainer library, which enables running integration tests in the exactly same way we run unit tests (using go test or directly from IDE). The only requirement for the tests is to have Docker running.

3. White-box testing

There are two ways we can approach creating tests: black-box testing and white-box testing.

In our case, we can imagine a sample black-box test as the following script:

# 1. Start local Geth
docker run livepeer/geth-with-livepeer-protocol:streamflow
# 2. Start Orchestrator
livepeer -network=devenv -ethUrl http://localhost:8545/ -orchestrator
# 3. Register Orchestrator
curl http://localhost:7936/activateOrchestrator
# 4. Start Broadcaster
livepeer -network=devenv -ethUrl http://localhost:8545/ -broadcaster
# 5. Fund Deposit/Reserve
curl http://localhost:7937/fundDepositAndReserve
# ...

While this script looks simple and it's really testing exactly what a user does, I believe this is not an approach we should take, because of the following reasons:

We may want to mock some parts of the system (e.g. video-transcoding)
We may need to do some tweaks to make tests repeatable (e.g. set winProb to 1)
We may want to test smaller parts of the flow (not fully E2E) at some point

Therefore, I propose to use white-box testing, but at the same time try to be as close to the black-box testing as possible. Sample white-box testing approach is presented in the PoC code skeleton and looks as follows.

func TestCompleteStreamingWorkflow(t *testing.T) {
    gethC := setupGeth(context.TODO(), t)
    defer gethC.Terminate(context.TODO())
    ethClient := newEthClient(gethC.URI, t)

    orch := startOrchestrator(ethClient)
    registerOrchestrator(orch)
    bcast := startBroadcaster(ethClient)
    fundDepositAndReserve(bcast)
}

func startOrchestrator(ethClient eth.LivepeerEthClient) *server.LivepeerServer {
    n, _ := core.NewLivepeerNode(ethClient, "./tmp", nil)
    s, _ := server.NewLivepeerServer("127.0.0.1:1938", n, true, "")
    // ...
    return s
}

4. User-facing scenarios

I think we should focus on having user-facing E2E scenarios as opposed to just checking if all the eth client functions work correctly. The reason for this approach is that most bug we discovered wasn't related to single on-chain interactions.

Here are the scenario I think we should cover first.

Scenario 1: Full B<>O workflow

Start O
Register O
Start B
Fund Deposit/Reserve
Start a stream
Check if the stream was transcoded
Check that a ticket was received by O
Redeem the ticket
Check ETH balance

Other scenarios to consider

B<>O failover
O<>T failover
Separate Redeemer process
Livepeer Router
Filtering O selection (maxPricePerUnit, latency, etc.)
Network issues
- Redeemer unavailable
- ETH URL unavailable / rate limited
Fast verification and suspension mechanism

Implementation Tasks and Considerations

Prepare geth-with-livepeer-protocol Docker image
- Update to the confluence version
- Prepare builds for arm64/v8
- (optional) Integrate GH Actions in livepeer/protocol to automatically build the Docker image
Implement Scenario 1
- Create reusable code with testcontainer
- Decide on the test structure (one scenario per file or all scenarios in one file)
- Refactor livepeer.go to simplify running it from tests
- Write code to cover Scenario 1
Create automation
- Create test_e2e.sh script
- Create GH Action
Implement other scenarios

Testing Tasks and Considerations

We need to make sure that:

Tests run correctly in: GH Actions, local amd64 and arm64/v8 (M1) environments
No test flakiness is noticed

Known Unknowns

There are a few things to consider during the implementation.

1. Mocking video transcoding

We can consider mocking the video transcoding part since it is resource-consuming. Nevertheless, I suggest to first try the real transcoding and optimise only when needed.

2. Scope of test scenarios

Initially, I plan to have complete user-facing testing scenarios, but if we find the scope of them too big, we can try splitting them into smaller ones. E.g. test only O registration instead of the full B<>O workflow.

3. No Arb Node Docker image

Currently, we only have Geth Docker image, but no Arb Node Docker image. The networks are compatible for most scenarios, but we may consider building Arb Node Docker image at some point.

Alternatives

1. Separate suite of tests:

Instead of creating a separate suite of tests, we could add the tests in the existing folders or/and run them with test.sh.

2. Testcontainer library

Instead of using the local Geth chain, we could run tests against Arbitrum Rinkeby, but I'm afraid we'll see a lot of flakiness happening due to:

constant data changes in Rinkeby
unavailability of ETH RPC endpoints

3. Golang tests

Instead of integrating tests into Golang, we could do one of the following:

Use BDD framework (e.g. Cucumber) - it would make the test scenarios look more readable, but at the same time make test code more complex and harder to maintain
Use bash - it would make the tests more like E2E, because that's how people run Liveeper, however I'm not in favor of this solution, because bash-based tests are hard to maintain and it will make impossible to do any white-box tweaks (which may be required in our case)

4. User-facing scenarios

Instead of creating the complete user-facing E2E scenarios, we could try to target just part of interactions, e.g. only O registration. Or only ticket redemption. I think we may need to change it into such format at some point, however I'd start from the full user-facing scenarios, because that's actually how we currently test it manually. Later, if we suffer from having too many scenarios, we can consider splitting it into smaller parts.

Additional Context

N/A

Overall this looks good to me and +1 for writing the tests in Go. The testcontainer package looks like it will be quite useful for this.

For Scenario 1, for simulating a stream, I recommend using HTTP ingest for B and not RTMP ingest because the latter will likely be deprecated soon. And if we're using HTTP ingest, for the purposes of simulating a stream we could just push a single segment to the API to start with which should simplify the test since triggering the full e2e workflow should just begin with submitting that single segment to the B's HTTP API.

As for the chain I think we should use Livepeer Geth docker image

At this point, there are alternative ETH blockchain clients that might be more dev friendly such as Hardhat node, but since we have some geth specific logic in devtool and since geth should work fine as long as the image is updated I'm onboard with sticking with geth for now and considering other options separately if we need to.

Update to the confluence version

Note that the current build script for setting up the image with pre-deployed contracts is pinned to an older pre-Confluence commit for the protocol repo that was still using the Truffle dev framework for deploying contracts. The current confluence branch of the protocol repo uses the Hardhat dev framework. Additionally, deploying the contracts won't work right out of the box because right now you have to deploy contracts from the protocol repo and the arbitrum-lpt-bridge repo because the protocol contracts depend on a L2LPTDataCache contract that is defined in the arbitrum-lpt-bridge. We should have a task for streamlining the contract deployment process so it is easier to update the image build script cc @kautukkundan @RiccardoBiosas for awareness and for help here when we define the task.

Refactor livepeer.go to simplify running it from tests

Interested to see how you're thinking about approaching this refactor. I'm guessing the end goal is to be able to programatically instantiate a B or O in Go s.t. the respective objects can be passed around and used in tests and the refactored livepeer.go would just instantiate each of these objects?

Thanks for the feedback @yondonfu

For Scenario 1, for simulating a stream, I recommend using HTTP ingest for B and not RTMP ingest because the latter will likely be deprecated soon. And if we're using HTTP ingest, for the purposes of simulating a stream we could just push a single segment to the API to start with which should simplify the test since triggering the full e2e workflow should just begin with submitting that single segment to the B's HTTP API.

Yes, it makes sense.

Update to the confluence version

Note that the current build script for setting up the image with pre-deployed contracts is pinned to an older pre-Confluence commit for the protocol repo that was still using the Truffle dev framework for deploying contracts. The current confluence branch of the protocol repo uses the Hardhat dev framework. Additionally, deploying the contracts won't work right out of the box because right now you have to deploy contracts from the protocol repo and the arbitrum-lpt-bridge repo because the protocol contracts depend on a L2LPTDataCache contract that is defined in the arbitrum-lpt-bridge. We should have a task for streamlining the contract deployment process so it is easier to update the image build script cc @kautukkundan @RiccardoBiosas for awareness and for help here when we define the task.

Ok, I see, it's more difficult than I thought! I'll ask for @kautukkundan and @RiccardoBiosas for the help here.

Refactor livepeer.go to simplify running it from tests

Interested to see how you're thinking about approaching this refactor. I'm guessing the end goal is to be able to programatically instantiate a B or O in Go s.t. the respective objects can be passed around and used in tests and the refactored livepeer.go would just instantiate each of these objects?

Yes, I though about having a struct like LivepeerOrchestrator which would encapsulate all that we currently pass as flags. Then, in the tests you'd just initialize this struct, and in the livepeer.go you pass flags to this struct. But TBH I need to dig deeper to check how difficult it would be.

Sounds like a great plan!

but since we have some geth specific logic in devtool and since geth should work fine as long as the image is updated I'm onboard with sticking with geth for now and considering other options separately if we need to.

@yondonfu out of curiosity, what breaking changes would be caused by replacing geth with one of the available alternatives? Also @leszko is there any reason why you'd prefer to deploy the protocol from scratch for the purpose of testing rather than forking livepeer on L2 against a pinned block and running all the pertinent scenarios there? If the purpose is to achieve a more realistic context for the e2e tests I think it'd be a better course of action. I was skimming through the arbitrum main repo and it seems they have recently implemented a node utility to fork the chain state

Additionally, deploying the contracts won't work right out of the box because right now you have to deploy contracts from the protocol repo and the arbitrum-lpt-bridge repo because the protocol contracts depend on a L2LPTDataCache contract that is defined in the arbitrum-lpt-bridge.

Both the livepeer protocol repo and arbitrum-lpt-bridge use hardhat-deploy which allows to access a a L1 and L2 network in the same context via companion networks. A possible approach could be to deploy the respective contracts on a dockerized geth network and arbitrum network and then package everything in a docker-compose. I think Optimism was doing something similar in the past.

B<>O failover
O<>T failover
Separate Redeemer process
Network issues
Redeemer unavailable
ETH URL unavailable / rate limited
Fast verification and suspension mechanism

Interesting! Curious how you're planning to implement the failover/resiliency tests. Do you have any framework/library in mind? I remember reading how Netflix tests the resiliency of their application lifecycle by causing failures in their components, here and here. Not sure if this is somewhat close to what you have in mind tho

@RiccardoBiosas

out of curiosity, what breaking changes would be caused by replacing geth with one of the available alternatives?

I think I actually may be wrong about there being breaking changes if we replaced geth...I thought that the use of geth's JS console feature in the devtool script in order to execute web3.js scripts would break if we replaced geth, but it actually looks like we use geth packages to both dial the RPC endpoint and to instantiate a console object that accepts JS scripts in Go which might work with other RPC providers because I'm guessing that geth's console package will just translate the JS scripts into RPC requests that are sent to the provider. I haven't tested any of this though!

is there any reason why you'd prefer to deploy the protocol from scratch for the purpose of testing rather than forking livepeer on L2 against a pinned block and running all the pertinent scenarios there?

I like the idea of using a client forked from a live network! I think this would mean that we'd need to use a forking capable client like Hardhat node (AFAIK geth doesn't support this) though. We'd still be able to do a clean deploy of the contracts if we wanted to, but by default we could just use the contracts that are already deployed on the live network instead of having to pre-deploy the contracts in the Docker image. Additionally, if we are ever preparing for a contract upgrade we could execute the upgrade on the fork and then run go-livepeer e2e tests against that upgraded local fork.

The main downsides I see of using a client forked from a live network:

Dependency on an external RPC provider (either a service like Infura/Alchemy or a self-hosted geth)
If we fork from Arbitrum mainnet, we'll need to adjust the block number on the fork since most clients that support forking (i.e. Hardhat node) will return the chain's block number for the block.number opcode which will start from the last known L2 block number when forking instead of the last known L1 block number (which is what we would want when forking from Arbitrum mainnet). The workaround to this right now would be set the chain's block number to the last known L1 block number. This would require additional work to do in an automated fashion. This might be addressed using the arb-fork-node tool mentioned - more on that below
Certain RPC requests will likely be slower if the relevant state is not cached locally yet thereby requiring an additional RPC request to the external provider for the live network so state can be fetched & cached locally. This doesn't seem like it would be too big of a deal as the e2e test scenarios we'll be working with don't seem like they'll be heavy on RPC requests.

If we think the effort to add the automation to address the second point is not substantial then I think using Hardhat node to fork could be worth trying out.

I was skimming through the arbitrum main repo and https://github.com/OffchainLabs/arbitrum/pull/2063

Nice find. This looks like it could be useful. However, it seems that it requires access to an Arbitrum rollup node which requires access to a L1 node which increases devops work/burden. Given that the tool was just created, I lean towards checking out the tool, but holding off on incorporating it into our setups until later on when we have more confidence in how to use it properly as well as how to properly setup a local Arbitrum rollup environment.

A possible approach could be to deploy the respective contracts on a dockerized geth network and arbitrum network and then package everything in a docker-compose.

Using docker-compose to package together all the dependencies for standing up a local L1/L2 network sounds like the right eventual path. However, given the above points, I think it could be worthwhile to get e2e testing working without an actual L1/L2 network first.

Interesting! Curious how you're planning to implement the failover/resiliency tests. Do you have any framework/library in mind? I remember reading how Netflix tests the resiliency of their application lifecycle by causing failures in their components, here and here. Not sure if this is somewhat close to what you have in mind tho

I didn't have any chaos testing or anything like that in mind. Rather a simple scenario:

Register a few Os on-chain
Start B and the stream
Kill one O
Check if the stream is still transcoded

That's a very interesting idea with forking the mainnet!

At first it seemed like a great idea for me, but the more I think about it, the less tempting it seems to me. The main benefit is clear that we're as close as possible to the prod env. In terms of the amount of work, I'm not sure if it's simpler to fork or prepare our own Docker image with contracts. However, I see some drawbacks and issues we'd need to address. Here are some thoughts. Keep in mind that I may be wrong in some points if I misunderstood how chain forking works.

1. Using external service in automated tests We already see a lot of rate limiting in the prod system, so we may encounter the same in the automated tests. What's worse, tests are more "flaky" if the connection is down or RPC URL is down. Also the time needed to execute the tests increases. The dev experience also won't be great, because instead of just executing ./test_e2e.sh, you'll have to create an account in Infura/Alchemy, set up the link, etc. And not all contributors to go-livepeer are interested in blockchain at all.

2. Network state I think we don't want to have the all the state in our local test environment. For example, it'd be better to start the system with no registered orchestrators and register them on our own (like we currently do in with our Geth Docker image). Otherwise, the E2E test will work more like what we currently do in the Rinkeby network. For example, if we test redeeming a ticket, we don't use B<>O discovery, but just statically set -orchAddr. And I guess we'd prefer to test B<>O discovery as well in E2E.

3. Integration with go test If it's all about starting one Docker image, we can integrate it well with the Golang tests. However, I believe forking will have some more requirements, e.g., Hardhat needs to be installed. We may probably hide it all in the Docker image, but it's an additional work (and maintenance).

Wdyt? @yondonfu @RiccardoBiosas

I've converted this spec into Epic and created GH Issues accordingly. You can take a look.

I know some of the points are still under discussion, so don't worry, if we make any decisions I'll update GH issues.

Looks like a great proposal and definitely something that'll also be useful as a living document of our core workflows for people new to the project etc.

My 2 (Euro) cents:

I think that having it be possible to run the tests in parallel from the start is probably a good goal to add - these types of tests tend to balloon over time, especially if things like transcoding are involved and designing them to be as stateless as possible from the start is easier than retrofitting it when they start getting too slow to run
I'd prefer having a BDD framework - your example https://github.com/leszko/go-livepeer/pull/4/files#diff-d17a3662af2b1da8f37044bcfdf17db4d5d5084ce00aa071f76bae25cbb2366aR56-R72 is neat, but I think over time it'll grow and diverge and lose the neat "Given / When / Then" format as people add to it. Having the BDD framework forces you into having a clear place to see of "what is this test trying to do?" versus how it's doing it
I'd vote for a "black box" approach. From past experience, anything else for these types of test makes it too easy / tempting to reach into the code and circumvent the real code paths. I'd even suggest having the tests live in a separate repo to help enfore the separation. For the points you raised:
- We may want to mock some parts of the system (e.g. video-transcoding)
- The more we mock, the further we are from a real world case. We can do things like only transcoding a single segment / running tests in parallel / etc. to speed them up
- We may need to do some tweaks to make tests repeatable (e.g. set winProb to 1)
- Hopefully we could expose most of these types of things as config / in DBs etc. that the tests stand up
- We may want to test smaller parts of the flow (not fully E2E) at some point
- Possibly better served by integration tests

Thanks for the feedback @thomshutt. I'm happy that you like the proposal and excited we'll soon start working on the E2E Tests!

Concerning your points:

Running tests in parallel - yes, it'd be great to run in parallel, but I think one test needs a clean chain, so that it's repeatable. That is why I think we should start from the single dev chain and execute tests sequentially and later try to parallelize tests by running multiple chains in parallel. It should be possible with testcontainers.
BDD has some benefits (test scenarios easy to read), but it also comes with the cost (harder to check the test execution code, because you have a layer of Step Definitions / Fixtures). I suggest we at least try Cucumber and see if we like it. Created a GH Issue for it. Note that I'd like to do it after writing the first scenario in Golang, because I think we should focus on the most important part first, which is a running scenario.
Black-box testing is the holy grail, so we would love to have it. On the other hand, there is some probablistic stuff happening and we don't really want to expose winProb as a user parameter (or move it to DB). Nevertheless, agree with the points you wrote. And I agree that black-box testing is better.

livepeer / go-livepeer

E2E tests #2297

Abstract

Motivation

Proposed Solution

1. Separate suite of tests

2. Testcontainer library

3. White-box testing

4. User-facing scenarios

Scenario 1: Full B<>O workflow

Other scenarios to consider

Implementation Tasks and Considerations

Testing Tasks and Considerations

Known Unknowns

1. Mocking video transcoding

2. Scope of test scenarios

3. No Arb Node Docker image

Alternatives

1. Separate suite of tests:

2. Testcontainer library

3. Golang tests

4. User-facing scenarios

Additional Context