[jaeger-v2] Storage backend integration tests

james-ryans commented 6 months ago

Requirement

With the Jaeger storage extension for Jaeger-v2 is going to have full support of Jaeger-v1's storage backends, some unit tests on every storage backends are not enough. We need to conduct end-to-end tests of OpenTelemetry Collector pipeline to the targeted database.

Problem

There are still no integration tests to test the actual stored traces to the database from V2 Jaeger storage extension.

Proposal

Fortunately, OpenTelemetry Collector already has a testbed framework to help us conducting the end-to-end tests.

Testbed is a controlled environment and tools for conducting end-to-end tests for the OpenTelemetry Collector, including reproducible short-term benchmarks, correctness tests, long-running stability tests and maximum load stress tests. However, we will only utilize the correctness tests from testbed, it generates and sends every combinatorial trace attributes and matches every single of them with the received traces from another end.

Architecture of the integration test

Here's the architecture we will use to test the OpenTelemetry Collector pipeline from end-to-end with the designated storage backends. jaeger-v2-testbed Testbed components:

LoadGenerator - encapsulates DataProvider and DataSender in order to generate and send data.
- Golden DataProvider - generates traces from the "Golden" dataset generated using pairwise combinatorial testing techniques. Testbed example uses PICT to generate the test data, e.g. testdata.
- OTLP Trace DataSender - with the generated traces from DataProvider, the DataSender sends traces to OTLP receiver in the collector instance.
Mockbackend - encapsulates DataReceiver and provides consume functionality.
- DataReceiver - we will create a custom DataReceiver that will host a Jaeger storage extension to retrieve traces from the database by pulling them using our artificial Jaeger storage receiver https://github.com/jaegertracing/jaeger/pull/5242.
- Consumer - consumer does not actually a thing in MockBackend but only to make the diagram intuitive, the traces received from our artificial receiver will be stored inside MockBackend.
Correctness Test Validator - checks if the traces received from MockBackend are all matches with the generated traces from DataProvider.

Plan

The execution of integration tests will be done incrementally one by one on every supported storage backends:

[x] GRPC with memory
- 5259
- 5266
- 5322
[x] Badger
- 5281
- 5355
[x] Cassandra #5398
[x] Elasticsearch
- 5407
- 5345
[x] Opensearch #5345

Open questions

No response

yurishkuro commented 6 months ago

@james-ryans as I was reviewing the PRs that follow from this issue, I am starting to have some concerns with this approach. Here is the set of requirements that I think we need to meet:

we need to exercise the full pipeline to write data externally and verify that it makes it to the storage
- (1b) we need to write data in different formats, not just OTLP
we then also need to exercise the querying API
we need to exercise archiving capability
we need to validate that the config files we're providing in cmd/jaeger are valid by doing an e2e smoke test
- (4b) in v1 we also had some docker-compose files that need to be tested
we need to generate code coverage for some parts of the code that do not get exercised in unit tests (usually related to initializing the storage drivers)
we need to provide a capability for external plugin providers (implementing gRPC Storage API, such as Quickwit or Postgress plugin) to also run e2e test for writing and querying, as a way of certifying compatibility with Jaeger

In the current state:

OTEL testbed solves only (1)
crossdock tests solve (1b) and partially (4b)
Our storage integration tests solve (2), (3) and (5), and I think (6)
Nothing solves (4)

I think we can solve all 6 requirements by building upon our existing integration tests rather than with OTEL testbed. Perhaps we can also find a way to utilize the testbed's data generation ability and incorporate it as a step in the overall integration, but on itself I don't see how it can solve all requirements.

we can keep integration tests operating as unit tests to address (2), (3), (5)
we can abstract how integration tests write and read span data, such that in the unit test mode they would call storage API as a library, but in e2e mode they will do the same via RPC requests. This can solve (1) and (4)
- note 1: right now testbed-based config for gRPC is different (#5281) from the one in cmd/jaeger, but probably artificially, if we run the main config and test writes and reads via RPCs we don't need to separate Badger storage into another process, it can run in the all-in-one mode
- note 2: when bootstrapping e2e tests, perhaps we can rely on docker-compose files instead of starting docker containers manually, this will help with (4b)
we can run unit test mode and e2e mode in the same workflow so that we don't have to start-up storage backends multiple times (expensive for ES/OS/Cassandra)
not sure about (6), my guess it should be able to reuse the grpc-storage tests
the crossdock tests have two parts: testing interoperability between SDKs and exercising the receipt of different formats of data produced by various legacy SDKs. We can retire the former part, but the latter is still useful since we cannot reproduce it locally without depending on deprecating SDKs. If the whole e2e test is running in docker-compose we can find a way to reuse this.

Achieving this will streamline our integration tests by converging onto a single framework, instead of using 3 different ones for bits and pieces. This is probably a large task, so I would like to find a path of incremental improvements that lead us to the overall goal. Let's give it some thought.

james-ryans commented 5 months ago

There are some points that are still ambiguous for me, and I want to clarify things. Right now, I’ll just want to focus on the first three points of your vision and intention:

Point 1 states "keep integration tests operating as unit tests". From what I understand, by operating as unit tests, you mean implementing it with test cases similar to the StorageIntegration struct in the plugin/storage/integration/integration.go. Thus, instead of testing end-to-end, where the only components visible to us are the source (data provider) and the sink (data receiver), and keeping the collector pipeline isolated, we can manually read and write span data ourselves by calling storage API library functions.
At the point 2, it states "unit test mode they would call storage API as a library, but in e2e mode they will do the same via RPC requests". I don't quite understand the statement. From my perspective, I think it means that in unit test, we create only the storage extension, and we directly write and read span data to it through the Writer and Reader at the storage/spanstore/interface.go API. While in e2e mode, we create the whole collector pipeline, send traces through the receiver, and validate them by reading with Reader storage API. Is that correct?
With the two points above, both unit test mode and e2e mode actually write to the storage backends (even though it is called unit test, I'm thinking that the "unit" refers solely the storage extension). So we can just start the storage backend once and reused it.

yurishkuro commented 5 months ago

Yes, that is all correct. For instance, with ES, in the unit test model the test will instantiate es.SpanWriter and when it calls writer.WriteSpan() it's an in-process call to the es storage implementation that writes directly to ES. But in e2e mode, a different SpanWriter will be instantiated that executes an OTLP-RPC request to the running collector, where it will be accepted by the receiver and written to storage by the exporter.

yurishkuro commented 5 months ago

unit test mode

flowchart LR
    Test -->|writeSpan| SpanWriter
    SpanWriter --> B(StorageBackend)
    Test -->|readSpan| SpanReader
    SpanReader --> B

    subgraph Integration Test Executable
        Test
        SpanWriter
        SpanReader
    end

e2e test mode

flowchart LR
    Test -->|writeSpan| SpanWriter
    SpanWriter --> RPCW[RPC_client]
    RPCW --> Receiver
    Receiver --> Exporter
    Exporter --> B(StorageBackend)
    Test -->|readSpan| SpanReader
    SpanReader --> RPCR[RPC_client]
    RPCR --> jaeger_query
    jaeger_query --> B

    subgraph Integration Test Executable
        Test
        SpanWriter
        SpanReader
        RPCW
        RPCR
    end

    subgraph jaeger-v2
        Receiver
        Exporter
        jaeger_query
    end

james-ryans commented 5 months ago

I have created an action plan to provide us with a clear, structured pathway so we can execute this in parallel. Some thoughts are welcome if my idea doesn't match with our vision.

Prototyping the new integration tests
1. Implement the unit test that exercise the querying API (2) and archives (3). Initializing storage driver codes (5) should automatically covered with this tests. 
  
  My thoughts of how this will be implemented is we need only to pass the config to the setup function that will starts the storage extension, within the setup we can retrieve the SpanWriter and SpanReader. Not sure, but probably we can reuse the StorageIntegration module. Also I found that the archiving capability only tested on Elasticsearch storage.
2. Extend the unit test for the e2e test, but instead of starts only the storage extension, we use config file in cmd/jaeger to spawn the whole collector pipeline (4), then implement the SpanWriter and SpanReader to send span data through gRPC requests to receiver and from jaeger_query (1).
List of the storage backends that need to be tested:
- memory
- gRPC
- badger
- cassandra
- elasticsearch
- opensearch
Refactoring and an example for external plugin provider.
1. Refactor the unit test and e2e test to run in the same workflow so they use the same storage backend. Need to extra careful with the previous written data.
2. Refactor bootstrapping tests to rely on docker-compose files.
3. Add an example on how to test external plugin providers with our gRPC storage tests.
Add the crossdock tests.

With this, we can prototype the unit test and e2e test modes in parallel. But, after the unit test is merged, we need to refactor the e2e test to have a similar structure. Once the unit test and e2e test for one of the storage backends are merged, we can continue working on the other backends. After that, we can do refactoring and an example from plan 2 in parallel. The last one is to give some thoughts and find out how to test the interoperability between SDKs and exercise the receipt of different formats of data in crossdock fashion.

james-ryans commented 5 months ago

And I'll try to prototype the e2e test for the gRPC storage backend since @Pushkarm029 is working on the gRPC unit test.

yurishkuro commented 5 months ago

@james-ryans a couple thoughts

there is a PR in progress to migrate archive test from ES to the main test suite #5207
my diagrams only show the extension of the existing /integration/ tests to work in e2e mode. Do you see the benefits of also using OTEL testbed in this setup?

james-ryans commented 5 months ago

there is a PR in progress to migrate archive test from ES to the main test suite Extracted Archive Test #5207

Ohh wow, nice.. I overlooked that this task exists. I'll take a look at it.

my diagrams only show the extension of the existing /integration/ tests to work in e2e mode. Do you see the benefits of also using OTEL testbed in this setup?

Some components of it might be useful but we can implement it on our own with ease if we want to, probably modifying it for our specific use case. I'm thinking that we should be able to use OTEL testbed collector (testbed/testbed/in_process_collector.go) to start the jaeger-v2.

Probably, also the OTEL testbed sender component to write the span data through RPC request. However, I still need to examine it to get a concrete picture. One concern is that the sender lacks the functionality to close the RPC connection.

yurishkuro commented 5 months ago

One main difference to me is that our integration tests generate very specific traces and then query for them in very specific ways, to actually exercise the querying capabilities & permutations. But OTEL testbed just generates a random flood of data and only checks that it all gets through (not even that, as I believe it only checks the IDs). That was really my question - what is the value of such data source? It's not really fuzz-testing since the data is still hardcoded (just permutated for the load). I could see it potentially being useful for stress testing, but we don't do that today (would need dedicated HW, not GH runners).

james-ryans commented 5 months ago

The sender is just a wrapper for OTLP exporter that we are able to call the ConsumeTraces func with our specific provided traces and the remaining will be handled by the sender to do the RPC requests. The OTEL testbed has data provider and sender components, and the data provider component is the one that generates random traces and pushes them through the sender. With the sender alone we should be able to utilize it for our integration tests.

james-ryans commented 5 months ago

@yurishkuro with the new integration requirements, we don't need to test the collector pipeline with testbed as I proposed before anymore, is that right? If it is, we can just delete it.

yurishkuro commented 5 months ago

I think so, but that was really my question to you - if we used the testbed, what additional aspects or behavior would it be testing?

james-ryans commented 5 months ago

Okay. It doesn't give any benefit anymore at this point, since all the test cases are already covered by the existing StorageIntegration. But we can use some parts of the components for ourselves to provide an easier setup for the new integration tests.

yurishkuro commented 5 months ago

Copying from https://github.com/jaegertracing/jaeger/pull/5355#discussion_r1566018600 - let's add this to the README.

flowchart LR
    Receiver --> Processor
    Processor --> Exporter
    JaegerStorageExension -->|"(1) get storage"| Exporter
    Exporter -->|"(2) write trace"| Badger

    Badger_e2e_test -->|"(1) POST /purge"| HTTP_endpoint
    JaegerStorageExension -->|"(2) getStorage()"| HTTP_endpoint
    HTTP_endpoint -.->|"(3) storage.(*Badger).Purge()"| Badger

    subgraph Jaeger Collector
        Receiver
        Processor
        Exporter

        Badger
        BadgerCleanerExtension
        HTTP_endpoint
        subgraph JaegerStorageExension
            Badger
        end
        subgraph BadgerCleanerExtension
            HTTP_endpoint
        end
    end

jaegertracing / jaeger