Add automated testing for zipkin-gcp

saturnism commented 5 years ago

Add automated testing for zipkin-gcp against a stackdriver project.

[x] Stackdriver Sender #140 #141
[x] Stackdriver Reporter #143
[x] Against Zipkin Server #150
[x] Docker image #159

codefromthecrypt commented 5 years ago

Then you can emulate the existing zipkin-example-foo tests by sending a generated trace https://github.com/openzipkin/zipkin-js-example/blob/master/.circleci/config.yml

then you can use the rest api of stackdriver to check, or just check /metrics instead if that's too hard

codefromthecrypt commented 5 years ago

cc @openzipkin/devops-tooling

meltsufin commented 5 years ago

Are we using Circle CI or Travis for testing this project? I only see Travis config and it skips test. I can add a service account key to either for enabling some integration tests against Stackdriver. Also, which tests would you want to run?

codefromthecrypt commented 5 years ago

we still check on pull requests. agree that if there were service related things that would drift, or sneaky commits that were never tested on a branch, it could be a problem. The skipping test on master could be disabled on this repo, as the tests aren't flakey (heh yet!)

https://github.com/openzipkin/zipkin-gcp/blob/master/travis/publish.sh#L116-L118

We'd need a new smoke test unless whatever we configure for that job is available (eg keys). Then someone can always run "manually" but it is still less manual than really manual (eg me doing things manually literally by assembling a server).

I would suggest composing the server together instead of running the things within a java classpath as often this is where things break. ex like the docker image https://github.com/openzipkin/docker-zipkin-gcp/. In fact we could just do this at the docker repo.. basically use jitpack to get the latest server and the lastest stackdriver module and fail if we can't send a trace and then look it back up with gcp api.

While I referred to a circleci job, I wouldn't suggest employing circleci as we already use travis anyway (I'd prefer thing mentioned to not use circleci, just not bothered enough to port on own right now.).

For this sort of test probably a periodic could be helpful, for lack of smarter interval/signal and also not wanting a chance to lock-up accounts. https://docs.travis-ci.com/user/cron-jobs/

If we end up testing things "once removed" ex via a docker job in another repo, we just mention at the README where that is.. I think it is ok, especially as many use docker and failing there is what I'm usually nervous about.

sg?

On Fri, Aug 2, 2019 at 4:10 AM Mike Eltsufin notifications@github.com wrote:

Are we using Circle CI or Travis for testing this project? I only see Travis config and it skips test. I can add a service account key to either for enabling some integration tests against Stackdriver. Also, which tests would you want to run?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-gcp/issues/130?email_source=notifications&email_token=AAAPVV53LHN3DIX5ZKNWCM3QCM7JZA5CNFSM4IHXPHGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3LX6EI#issuecomment-517439249, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVVYCA6TZHXA4Q7A2LB3QCM7JZANCNFSM4IHXPHGA .

meltsufin commented 5 years ago

So, I understood you correctly, we should use Travis -- sure comfortable with that -- and test by basically assembling a server and sending a trace through it and verifying it. We can then figure out when we want the test to run. I think we should keep the test in this repo, since it is where the code that we're testing is maintained. That sounds good.

saturnism commented 5 years ago

@elefeint created a test project and service account that we can use to testing. i'll take a quick stab at this!

saturnism commented 5 years ago

@adriancole I noticed that the travis ci is configured to only work on a release tag, and tests are skipped. there is a similar setup for zipkin-aws.

Secondly, all the unit tests works w/o a real stackdriver backend (that's nice). i don't think we want to change that behavior.

I feel we can add new integration tests in src/it for relevant modules, and only run on verify and/or w/ a flag.

Should we enable it in the master branch? or, enable tests and integration tests on release tag?

saturnism commented 5 years ago

oops, i'm mistaken; the ci works against master, and the tests are triggered from publish.sh

saturnism commented 5 years ago

For some reason, tests in sender-stackdriver are not run.

https://travis-ci.org/openzipkin/zipkin-gcp/jobs/575119300#L1640

[INFO] Results:
[INFO] 
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- maven-jar-plugin:3.1.2:jar (default-jar) @ zipkin-sender-stackdriver ---
[INFO] Building jar: /home/travis/build/openzipkin/zipkin-gcp/sender-stackdriver/target/zipkin-sender-stackdriver-0.13.5-SNAPSHOT.jar
[INFO]

elefeint commented 5 years ago

I was wondering about that. I think it's because JUnit5 is on the classpath, but the imports are not Jupiter imports, so no tests are found.


[DEBUG]    io.zipkin.zipkin2:zipkin-tests:jar:2.16.1:test                                            
[DEBUG]       org.junit.jupiter:junit-jupiter:jar:5.5.1:test                                         
[DEBUG]          org.junit.jupiter:junit-jupiter-api:jar:5.5.1:test                                  
[DEBUG]             org.apiguardian:apiguardian-api:jar:1.1.0:test                                   
[DEBUG]             org.opentest4j:opentest4j:jar:1.2.0:test                                         
[DEBUG]             org.junit.platform:junit-platform-commons:jar:1.5.1:test                         
[DEBUG]          org.junit.jupiter:junit-jupiter-params:jar:5.5.1:test                               
[DEBUG]          org.junit.jupiter:junit-jupiter-engine:jar:5.5.1:test                               
[DEBUG]             org.junit.platform:junit-platform-engine:jar:1.5.1:test

elefeint commented 5 years ago

Yeah. propagation tests run fine because the project does NOT import io.zipkin.zipkin2:zipkin-tests.

saturnism commented 5 years ago

@elefeint and I were able to make all the tests run as-is with Junit 4 annotation by adding org.junit.vintage:junit-vintage-engine dependency.

Also found out that there are existing IT* integration tests that are just not ran unless if surefire is configured to run them. I'll follow the same convention rather than having a separate src/it directory.

Will send in a few PRs:

To add the vintage dependency just so all the tests will run.
1. It may make sense to add vintage dependency to zipkin-tests instead in the long run
Update failsafe configuration to run integration tests for the verify phase
Add more integration tests that sends data to Stackdriver

wdyt, @adriancole ?

codefromthecrypt commented 5 years ago

I think the vintage engine being in zipkin-tests is a very good idea as not running tests by accident is quite nasty. I'm sure @anuraaga agrees. We didn't intend to stop those..

codefromthecrypt commented 5 years ago

the stuff you mention makes sense btw. yeah for example, zipkin-reporter has failsafe properly configured, I guess we don't here

codefromthecrypt commented 5 years ago

https://github.com/openzipkin/zipkin/pull/2778

saturnism commented 5 years ago

Alright! we are almost there. After this service account, will see how we do this for docker-zipkin-gcp and/or another full server test.

saturnism commented 5 years ago

Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?

Secondly, for docker-zipkin-gcp, what's the build/release process like for that one? E.g., is it automatically triggered from the release from this repo, or is there a separate process? I think adding integration tests directly to docker-zipkin-gcp makes sense as part of the release step.

saturnism commented 5 years ago

Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?

Ah, I found it through the autoconfiguration: https://github.com/openzipkin/zipkin-gcp/blob/master/autoconfigure/storage-stackdriver/src/main/java/zipkin/autoconfigure/storage/stackdriver/ZipkinStackdriverStorageAutoConfiguration.java#L127

codefromthecrypt commented 5 years ago

@anuraaga is refactoring the docker build at the moment. You can take a I think soon the build will occur here, but at the moment it is manually triggered after a release here. https://github.com/openzipkin/docker-zipkin/issues/86

On Wed, Sep 4, 2019 at 1:54 AM Ray Tsang notifications@github.com wrote:

Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?

Secondly, for docker-zipkin-gcp, what's the build/release process like for that one? E.g., is it automatically triggered from the release from this repo, or is there a separate process? I think adding integration tests directly to docker-zipkin-gcp makes sense as part of the release step.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

saturnism commented 5 years ago

use jitpack to get the latest server and the lastest stackdriver module and fail if we can't send a trace and then look it back up with gcp api

Trying this out, I think jit pack may just be too slow?

wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
--2019-09-05 21:09:48--  https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
Resolving jitpack.io (jitpack.io)... 104.24.23.62, 104.24.22.62
Connecting to jitpack.io (jitpack.io)|104.24.23.62|:443... connected
...

It took a good 40s to 1m+ or to build and download.

saturnism commented 5 years ago

For Zipkin GCP stackdriver module:

wget https://jitpack.io/com/github/openzipkin/zipkin-gcp/zipkin-autoconfigure-storage-stackdriver/master-SNAPSHOT/zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar

saturnism commented 5 years ago

wget wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
unzip zipkin-server-master-SNAPSHOT-exec.jar -d zipkin
wget https://jitpack.io/com/github/openzipkin/zipkin-gcp/zipkin-autoconfigure-storage-stackdriver/master-SNAPSHOT/zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar
unzip zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar -d stackdriver

STORAGE_TYPE=stackdriver \
STACKDRIVER_PROJECT_ID=... \
GOOGLE_APPLICATION_CREDENTIALS=... \
java -Dloader.path=stackdriver -Dspring.profiles.active=stackdriver -cp zipkin/ org.springframework.boot.loader.PropertiesLauncher

To validate:

curl -v localhost:9411/actuator/health
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 9411 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 9411 failed: Connection refused
* Failed to connect to localhost port 9411: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 9411: Connection refused
rayt-macbookpro2:kubernetes rayt$ curl -v localhost:9411/actuator/health
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9411 (#0)
> GET /actuator/health HTTP/1.1
> Host: localhost:9411
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/vnd.spring-boot.actuator.v2+json
< content-length: 347
< 
* Connection #0 to host localhost left intact
{"status":"DOWN","details":{"zipkin":{"status":"DOWN","details":{"StackdriverStorage{wise-coyote-82123}":{"status":"DOWN","details":{"error":"com.linecorp.armeria.common.grpc.protocol.ArmeriaStatusException: Requested entity was not found."}}}},"diskSpace":{"status":"UP","details":{"total":499963170816,"free":62758592512,"threshold":10485760}}}}

Actuator health endpoint returns 200 OK even though status is DOWN. :/ I guess we'll have to parse output w/ jq and look for UP

curl --silent localhost:9411/actuator/health| jq -r .details.zipkin.status

codefromthecrypt commented 5 years ago

still faster than travis right?.. also it is on demand and cached otherwise.

If we were talking 30minutes (like main zipkin repo) one thing but 1m dont think we need to optimize do we?

On Fri, Sep 6, 2019, 9:14 AM Ray Tsang notifications@github.com wrote:

use jitpack to get the latest server and the lastest stackdriver module and fail if we can't send a trace and then look it back up with gcp api

Trying this out, I think jit pack may just be too slow?

wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar --2019-09-05 https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar--2019-09-05 21:09:48-- https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar Resolving jitpack.io (jitpack.io)... 104.24.23.62, 104.24.22.62 Connecting to jitpack.io (jitpack.io)|104.24.23.62|:443... connected ...

It took a good 40s to 1m+ or to build and download.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-gcp/issues/130?email_source=notifications&email_token=AAAPVV3ZNUM5S4ENEQN5ZBTQIGVJHA5CNFSM4IHXPHGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6BNDTQ#issuecomment-528667086, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV3ADGUVZPOL6PUOSKTQIGVJHANCNFSM4IHXPHGA .

anuraaga commented 5 years ago

I agree jitpack is fine too but any reason not to use the jfrog snapshots?

https://oss.jfrog.org/artifactory/oss-snapshot-local/

saturnism commented 5 years ago

i'm ok w either repo. but, to @anuraaga 's point in https://github.com/openzipkin/docker-zipkin/issues/86#issuecomment-528444597 - when should this run? I'm thinking only on master commits (and new tags?), since PRs can't get access to credentials anyways.

anuraaga commented 5 years ago

Ah was thinking PR at first but the credentials are definitely an issue - master commits SGTM

codefromthecrypt commented 5 years ago

only benefit of jitpack is branch testing. if this issue has no scope for that than snapshots is fine!

On Fri, Sep 6, 2019, 10:10 AM Anuraag Agrawal notifications@github.com wrote:

I agree jitpack is fine too but any reason not to use the jfrog snapshots?

https://oss.jfrog.org/artifactory/oss-snapshot-local/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-gcp/issues/130?email_source=notifications&email_token=AAAPVV55DEON4UUDEMNZAEDQIG3XZA5CNFSM4IHXPHGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6BPY4A#issuecomment-528678000, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV5P7HBQJOZ25S6D3ZLQIG3XZANCNFSM4IHXPHGA .

saturnism commented 5 years ago

From PR comment @anuraaga :

I think a similar server test could be done by using maven project and adding the snapshot repo and dependency on the exec jar. Then you could just call ZipkinServer.main. A lot of the logic here like the massive wait-for-it would be simpler in Java.

What do you think of that approach?

I considered staying w/ the JVM and explored using a JUnit test, or as a IntegrationTest. There were a number of issues that needs to be resolved/thought through:

zipkin-gcp-parent brings in logback, it doesn't work well w/ Zipkin Server. When running this in a test scope - slf4j fails fast and nothing is started.
if we don't inherit the parent, then we have to specify all the versions. How do we keep versions up-to-date?
Similarly, if we don't use Jitpack, depending on snapshot still requires a version, such as 2.16.3-SNAPSHOT. What if 2.16.3 is released? We need a consistent way to update that as well.
Finally if we run this under test scope, we may be testing the wrong thing as test scope brings in more dependencies than the runtime. I'm not sure what they are going to be.

I feel it's best to test this the way it's meant to be assembled and run. wait-for-it is a simple script that we don't need to maintain. It's straight forward outside of that and tests the actual runtime conditions (outside of a container).

I'm also thinking - this test only tests against the latest of Zipkin Server. However, if Zipkin server produced a breaking change, and we never really committed any code here - then we'll still miss it.

The only time we are certain of which versions we want to put together is when we produce the Docker container. Would it make sense to remove this test here, and simply to the container test?

Or, it might make more sense to have a completely separate repository w/ only tests of different zipkin server + autoconfigurations.

codefromthecrypt commented 5 years ago

cc @openzipkin/core

Let's all please never suggest or support zipkin integrations via dependencies. Following that, there's zero point in testing via dependencies except in the core server tree itself.

Those doing gitter support know we are plagued with ad-hoc questions about custom servers via dependencies, sometimes spamming us on multiple channels about custom servers. This happens to the point of us wanting to refactor our entire README strategy. cc @jorgheymans who is trying to help sort this part.

Refresh of the general sort of problems

A. people will get mixed signals, seeing us test an integration eventhough we actively don't support it. B. It hides real integration problems C. it doesn't follow our README here or in any similar repos like zipkin-aws

Anyway please please please don't contribute to our support burden. Test integrated things as documented and supported, don't setup false tests that look like they pass but don't, or setup tests that act as an accidental gateway for people to start asking us to support custom servers.

codefromthecrypt commented 5 years ago

PS to clarify what's desired imho vs what's not. I like almost exactly what Ray did on the PR because it is simple to understand, with least logic. I really would not like to move testing to another repo. There's a glitch in the PR but otherwise it is right on.

We have so many problems to deal with, above all we need our integration tests to not become new problems or introduce complex repository dependencies that no-one knows how to do.

This follows the history since 2015 where @abesto has always made our shell scripts act as if humans type the commands as sometimes we need to (I certainly do). This has helped avoid disaster multiple occasions and also increases the quickness of resolving problems. For example, if we move this sort of "README testing" into java, it would only increase the complexity, limit people who would want to help to java people, and also not actually test the README, or things that act like it such as our docker build. If nothing else, the history of this repo should teach us this lesson.. the multitude of integration problems with boringssl etc. Let's please KISS

anuraaga commented 5 years ago

All the reasons make sense to me to stick with the current approach. And yeah I think even better would be to just run the test when building the docker container. I think we can get the current one in for now and keep or remove it later if there's something better. I'll look into the docker test on Wednesday

codefromthecrypt commented 5 years ago

.. I think even better would be to just run the test when building the docker container.

that's a great idea! the container is built on change to master now, so yeah similar to how we run tests before uploading snapshots, this looks like the right thing to do before master tag

saturnism commented 5 years ago

https://github.com/openzipkin/zipkin-gcp/pull/150#issuecomment-532107697

So, I think the autoconfiguration fails against Zipkin master because the master was updated to use Armeria 0.91. I think it introduced non-backward compatible change.

This is validated by changing zipkin-gcp's armeria.version to 0.91.0, and the unit tests starts to fail too:

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   StackdriverSpanConsumerTest.accept » NoSuchMethod zipkin2.storage.stackdriver....
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsFailureWhenServiceFailsForUnknownReason » NoSuchMethod
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsFailureWhenServiceFailsWithKnownGrpcFailure » NoSuchMethod
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsOkWhenExpectedValidationFailure » NoSuchMethod
[ERROR]   StackdriverSpanConsumerTest.verifyCheckReturnsOkWhenServiceSucceeds » NoSuchMethod
[INFO] 
[ERROR] Tests run: 6, Failures: 0, Errors: 5, Skipped: 0

I guess it was useful to know that Zipkin GCP currently doesn't work w/ Zipkin master. But since we don't know what version combination will be assembled by the container, we should defer the test when the combination is known, i.e., when we are building the container.

codefromthecrypt commented 5 years ago

Indeed it is good news. And now you know why we "super big angry yellow flashing light" say we don't support custom servers :)

saturnism commented 4 years ago

should we abandon the server test here and supersede in https://github.com/openzipkin/docker-zipkin/issues/86 ?

codefromthecrypt commented 4 years ago

you mean only test integration via docker? (ex no separate test integrating with java ...)

saturnism commented 4 years ago

@adriancole ah just got back from vacation. i did mean only docker. but i see that you also found testing here to be useful. thnx for updating the PR!

saturnism commented 4 years ago

finally got some downtime and cycle to finish this up w/ the docker hub test hook discussed in https://github.com/openzipkin-attic/docker-zipkin/issues/86

saturnism commented 4 years ago

docker hub seems to be much harder to test/validate my changes... log entries are not streamed... no log output after 10min :(

saturnism commented 4 years ago

@anuraaga everything is merged; but i don't think we have autotest turned on. Want to trigger the test manually and make sure everything works?

anuraaga commented 4 years ago

@saturnism Nope don't think there was anything wrong with the configuration - looked at the webhook logs in GitHub and don't see anything corresponding to the master commit. Guess GitHub might have had a random blurp so I pushed another mostly no-op commit which is building now - https://cloud.docker.com/u/openzipkin/repository/registry-1.docker.io/openzipkin/zipkin-gcp/builds/2df22c28-c7eb-4155-92ee-bebe135a99c9

saturnism commented 4 years ago

Oh good - thanks! Great to see this working. I feel we finished all the tests :D Can finally close now.

anuraaga commented 4 years ago

@saturnism We had one last question on whether we keep the current Travis Jitpack-based server integration test or not now that we have the docker container one. I personally lean towards removing it since the Docker test is very similar without relying on yet another artifact store. What do you think?

saturnism commented 4 years ago

i thought @adriancole wanted to keep it as part of the CI for new code, so that it's caught earlier than container creation, against the master branch of the server. Also, the check on GitHub is slightly more stringent (i.e., it validates the payload response).

The code is already written. Unless there is significant overhead, it doesn't hurt to keep. I'm agnostic in the end though.

anuraaga commented 4 years ago

I think that was before baking the master builds of the container - now that docker always builds and tests and sends emails on breakage I guess it's enough. But true that it's already code that's been written, my main concern is using Jitpack. Let's leave it and if it ever flakes even once delete it then :)

openzipkin / zipkin-gcp

Add automated testing for zipkin-gcp #130