Add E2E receiver/export correctness tests

tigrannajaryan commented 4 years ago

We currently have E2E tests that benchmark the performance of various formats.

We also need E2E tests that verify the correctness of the Collector operation as it receives and exports the data in various formats. Performance tests don't verify this today. We need separate tests that will send telemetry data to the Collector, covering all possible variety of such data and then verify that the Collector exports this data precisely as it is supposed to be represented in the configured export format.

The preference is to have a matrix test that verifies many receiver/exporter combinations and uses golden data sets for verification.

Possible approach:

[ ] Implement a span generator that accepts several boolean, enum and numeric flags that control what kind of span to generate: with or without a particular field, how many attributes, what type of attributes, etc. Make sure to include ability to generate spans with nil fields, zero-sized slices, etc - ensure edge cases are covered.
[ ] Write a test that generates a variety of spans. Possibly try toggling true/false every boolean flag that the generator accepts, use all values for enum flags and use counts of 0, 1 and random higher number for numeric flags (e.g. number of attributes). Send the span via testbed, receive and compare it to the original.
[ ] Perform the test for all combination of receivers and exporters that are supported in the testbed (N*N tests total).
[ ] Make sure cases like empty spans, or empty batches of spans are covered.
[ ] Make the test configurable and have it accept a list of receivers and exporters to test and a list of processors to enable during the test. Make sure all default recommended processors are enabled: memorylimiter, batch, queue.
[ ] Export the test as a public ScenarioTraceTranslation and also call it in Contrib to test receivers and exporters in Contrib. Enable contrib processors (e.g. k8s processor).

mat-rumian commented 4 years ago

I will be happy to help with this :)

kbrockhoff commented 4 years ago

I will soon be submitting a PR for generating and managing "Golden Data". It will have the following components and process steps:

Variation parameters - Various data fields which can vary in different observations. For example for trace spans, I currently am using: Parent, Tracestate, Kind, Attributes, Events, Links, Status
PICT input files - Definitions to feed the Pairwise Independent Combinatorial Testing tool PICT
PICT output files - Output from Pairwise Independent Combinatorial Testing tool with recommended data combinations
Golden data generator - Reads PICT output files and generates corresponding real world like data and then serializes as OTLP to files
Additional test data directory - Holds additional OTLP serialized data examples not covered by the Golden Data generator
Bad data recording processor - OT Collector processor to OTLP serialize data items which cause exporters and other processors to return invalid data errors. These can then be added to the additional test data directory to easily reproduce the errors.
Correctness test executor - Spins up various otelcol pipeline configurations based on the appropriate PICT output file and then feeds all of the serialized data examples through the pipeline and checks the output.

tigrannajaryan commented 4 years ago

@kbrockhoff great, this will be a very useful addition. Please make smaller incremental PRs if possible to make reviewing easier.

kbrockhoff commented 4 years ago

I plan to write these tests to verify the API in the generalize-testbed branch. If @pmcollins has not started, you can assign the ticket to me. Or else I am happy to advise on how to write the tests using the generalized testbed.

pmcollins commented 4 years ago

Either way works for me, @kbrockhoff . I was part way through a proof of concept for how to test for correctness: two pipelines, a pipeline under test and a test harness pipeline. The test harness pipeline has a processor that sends metrics to an exporter that is configured to talk to the pipeline under test, from which it is configured to also receive metrics. The same processor in the test harness pipeline compares the received metrics to what it sent. But maybe the generalize-testbed branch is the way to go instead (I was mostly out last week and wasn't aware of it).

Originally, I think we, including @tigrannajaryan, thought that maybe you (or someone) could work on the traces tests and I could work on the metrics tests (I'm more familiar with metrics). But maybe the way to go is for one of us to hold off until the other has an implementation. I'm happy to be the one to hold off since it looks like you have made significantly more progress than I have.

tigrannajaryan commented 4 years ago

I think since @kbrockhoff started the trace PICT generator it is best that he continues working on it and @pmcollins you can work on the similar capability and tests for metrics part. Pablo, you are right that it may be best to wait a bit until Kevin is done with testbed refactoring. Kevin, do you need more changes to the testbed after this PR is merged?

kbrockhoff commented 4 years ago

I was planning to do all the refactoring in one PR. Still have a few improvements to make yet before it is ready for merging.

kbrockhoff commented 4 years ago

Testbed changes have been merged to master. Correctness tests for traces have been completed as part of the PR. Work on metrics correctness tests can now proceed.

tigrannajaryan commented 4 years ago

Closing this issue, correctness tests now exist.

open-telemetry / opentelemetry-collector

Add E2E receiver/export correctness tests #652