Closed saturnism closed 4 years ago
Then you can emulate the existing zipkin-example-foo tests by sending a generated trace https://github.com/openzipkin/zipkin-js-example/blob/master/.circleci/config.yml
then you can use the rest api of stackdriver to check, or just check /metrics instead if that's too hard
cc @openzipkin/devops-tooling
Are we using Circle CI or Travis for testing this project? I only see Travis config and it skips test. I can add a service account key to either for enabling some integration tests against Stackdriver. Also, which tests would you want to run?
we still check on pull requests. agree that if there were service related things that would drift, or sneaky commits that were never tested on a branch, it could be a problem. The skipping test on master could be disabled on this repo, as the tests aren't flakey (heh yet!)
https://github.com/openzipkin/zipkin-gcp/blob/master/travis/publish.sh#L116-L118
We'd need a new smoke test unless whatever we configure for that job is available (eg keys). Then someone can always run "manually" but it is still less manual than really manual (eg me doing things manually literally by assembling a server).
I would suggest composing the server together instead of running the things within a java classpath as often this is where things break. ex like the docker image https://github.com/openzipkin/docker-zipkin-gcp/. In fact we could just do this at the docker repo.. basically use jitpack to get the latest server and the lastest stackdriver module and fail if we can't send a trace and then look it back up with gcp api.
While I referred to a circleci job, I wouldn't suggest employing circleci as we already use travis anyway (I'd prefer thing mentioned to not use circleci, just not bothered enough to port on own right now.).
For this sort of test probably a periodic could be helpful, for lack of smarter interval/signal and also not wanting a chance to lock-up accounts. https://docs.travis-ci.com/user/cron-jobs/
If we end up testing things "once removed" ex via a docker job in another repo, we just mention at the README where that is.. I think it is ok, especially as many use docker and failing there is what I'm usually nervous about.
sg?
On Fri, Aug 2, 2019 at 4:10 AM Mike Eltsufin notifications@github.com wrote:
Are we using Circle CI or Travis for testing this project? I only see Travis config and it skips test. I can add a service account key to either for enabling some integration tests against Stackdriver. Also, which tests would you want to run?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-gcp/issues/130?email_source=notifications&email_token=AAAPVV53LHN3DIX5ZKNWCM3QCM7JZA5CNFSM4IHXPHGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3LX6EI#issuecomment-517439249, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVVYCA6TZHXA4Q7A2LB3QCM7JZANCNFSM4IHXPHGA .
So, I understood you correctly, we should use Travis -- sure comfortable with that -- and test by basically assembling a server and sending a trace through it and verifying it. We can then figure out when we want the test to run. I think we should keep the test in this repo, since it is where the code that we're testing is maintained. That sounds good.
@elefeint created a test project and service account that we can use to testing. i'll take a quick stab at this!
@adriancole I noticed that the travis ci is configured to only work on a release tag, and tests are skipped. there is a similar setup for zipkin-aws.
Secondly, all the unit tests works w/o a real stackdriver backend (that's nice). i don't think we want to change that behavior.
I feel we can add new integration tests in src/it
for relevant modules, and only run on verify and/or w/ a flag.
Should we enable it in the master branch? or, enable tests and integration tests on release tag?
oops, i'm mistaken; the ci works against master, and the tests are triggered from publish.sh
For some reason, tests in sender-stackdriver
are not run.
https://travis-ci.org/openzipkin/zipkin-gcp/jobs/575119300#L1640
[INFO] Results:
[INFO]
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
[INFO] --- maven-jar-plugin:3.1.2:jar (default-jar) @ zipkin-sender-stackdriver ---
[INFO] Building jar: /home/travis/build/openzipkin/zipkin-gcp/sender-stackdriver/target/zipkin-sender-stackdriver-0.13.5-SNAPSHOT.jar
[INFO]
I was wondering about that. I think it's because JUnit5 is on the classpath, but the imports are not Jupiter imports, so no tests are found.
[DEBUG] io.zipkin.zipkin2:zipkin-tests:jar:2.16.1:test
[DEBUG] org.junit.jupiter:junit-jupiter:jar:5.5.1:test
[DEBUG] org.junit.jupiter:junit-jupiter-api:jar:5.5.1:test
[DEBUG] org.apiguardian:apiguardian-api:jar:1.1.0:test
[DEBUG] org.opentest4j:opentest4j:jar:1.2.0:test
[DEBUG] org.junit.platform:junit-platform-commons:jar:1.5.1:test
[DEBUG] org.junit.jupiter:junit-jupiter-params:jar:5.5.1:test
[DEBUG] org.junit.jupiter:junit-jupiter-engine:jar:5.5.1:test
[DEBUG] org.junit.platform:junit-platform-engine:jar:1.5.1:test
Yeah. propagation tests run fine because the project does NOT import io.zipkin.zipkin2:zipkin-tests.
@elefeint and I were able to make all the tests run as-is with Junit 4 annotation by adding
org.junit.vintage:junit-vintage-engine
dependency.
Also found out that there are existing IT*
integration tests that are just not ran unless if surefire is configured to run them. I'll follow the same convention rather than having a separate src/it
directory.
Will send in a few PRs:
zipkin-tests
instead in the long runfailsafe
configuration to run integration tests for the verify
phasewdyt, @adriancole ?
I think the vintage engine being in zipkin-tests is a very good idea as not running tests by accident is quite nasty. I'm sure @anuraaga agrees. We didn't intend to stop those..
the stuff you mention makes sense btw. yeah for example, zipkin-reporter has failsafe properly configured, I guess we don't here
Alright! we are almost there. After this service account, will see how we do this for docker-zipkin-gcp
and/or another full server test.
Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?
Secondly, for docker-zipkin-gcp
, what's the build/release process like for that one? E.g., is it automatically triggered from the release from this repo, or is there a separate process? I think adding integration tests directly to docker-zipkin-gcp
makes sense as part of the release step.
Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?
Ah, I found it through the autoconfiguration: https://github.com/openzipkin/zipkin-gcp/blob/master/autoconfigure/storage-stackdriver/src/main/java/zipkin/autoconfigure/storage/stackdriver/ZipkinStackdriverStorageAutoConfiguration.java#L127
@anuraaga is refactoring the docker build at the moment. You can take a I think soon the build will occur here, but at the moment it is manually triggered after a release here. https://github.com/openzipkin/docker-zipkin/issues/86
On Wed, Sep 4, 2019 at 1:54 AM Ray Tsang notifications@github.com wrote:
Trying to add integration test to stackdriver storage as well. I can't seem to find how auth token was added to the call. Any ideas?
Secondly, for docker-zipkin-gcp, what's the build/release process like for that one? E.g., is it automatically triggered from the release from this repo, or is there a separate process? I think adding integration tests directly to docker-zipkin-gcp makes sense as part of the release step.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
use jitpack to get the latest server and the lastest stackdriver module and fail if we can't send a trace and then look it back up with gcp api
Trying this out, I think jit pack may just be too slow?
wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
--2019-09-05 21:09:48-- https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
Resolving jitpack.io (jitpack.io)... 104.24.23.62, 104.24.22.62
Connecting to jitpack.io (jitpack.io)|104.24.23.62|:443... connected
...
It took a good 40s to 1m+ or to build and download.
For Zipkin GCP stackdriver module:
wget https://jitpack.io/com/github/openzipkin/zipkin-gcp/zipkin-autoconfigure-storage-stackdriver/master-SNAPSHOT/zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar
wget wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar
unzip zipkin-server-master-SNAPSHOT-exec.jar -d zipkin
wget https://jitpack.io/com/github/openzipkin/zipkin-gcp/zipkin-autoconfigure-storage-stackdriver/master-SNAPSHOT/zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar
unzip zipkin-autoconfigure-storage-stackdriver-master-SNAPSHOT-module.jar -d stackdriver
STORAGE_TYPE=stackdriver \
STACKDRIVER_PROJECT_ID=... \
GOOGLE_APPLICATION_CREDENTIALS=... \
java -Dloader.path=stackdriver -Dspring.profiles.active=stackdriver -cp zipkin/ org.springframework.boot.loader.PropertiesLauncher
To validate:
curl -v localhost:9411/actuator/health
* Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 9411 failed: Connection refused
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 9411 failed: Connection refused
* Failed to connect to localhost port 9411: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 9411: Connection refused
rayt-macbookpro2:kubernetes rayt$ curl -v localhost:9411/actuator/health
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9411 (#0)
> GET /actuator/health HTTP/1.1
> Host: localhost:9411
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/vnd.spring-boot.actuator.v2+json
< content-length: 347
<
* Connection #0 to host localhost left intact
{"status":"DOWN","details":{"zipkin":{"status":"DOWN","details":{"StackdriverStorage{wise-coyote-82123}":{"status":"DOWN","details":{"error":"com.linecorp.armeria.common.grpc.protocol.ArmeriaStatusException: Requested entity was not found."}}}},"diskSpace":{"status":"UP","details":{"total":499963170816,"free":62758592512,"threshold":10485760}}}}
Actuator health endpoint returns 200 OK even though status is DOWN. :/ I guess we'll have to parse output w/ jq
and look for UP
curl --silent localhost:9411/actuator/health| jq -r .details.zipkin.status
still faster than travis right?.. also it is on demand and cached otherwise.
If we were talking 30minutes (like main zipkin repo) one thing but 1m dont think we need to optimize do we?
On Fri, Sep 6, 2019, 9:14 AM Ray Tsang notifications@github.com wrote:
use jitpack to get the latest server and the lastest stackdriver module and fail if we can't send a trace and then look it back up with gcp api
Trying this out, I think jit pack may just be too slow?
wget https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar --2019-09-05 https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar--2019-09-05 21:09:48-- https://jitpack.io/com/github/openzipkin/zipkin/zipkin-server/master-SNAPSHOT/zipkin-server-master-SNAPSHOT-exec.jar Resolving jitpack.io (jitpack.io)... 104.24.23.62, 104.24.22.62 Connecting to jitpack.io (jitpack.io)|104.24.23.62|:443... connected ...
It took a good 40s to 1m+ or to build and download.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-gcp/issues/130?email_source=notifications&email_token=AAAPVV3ZNUM5S4ENEQN5ZBTQIGVJHA5CNFSM4IHXPHGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6BNDTQ#issuecomment-528667086, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV3ADGUVZPOL6PUOSKTQIGVJHANCNFSM4IHXPHGA .
I agree jitpack is fine too but any reason not to use the jfrog snapshots?
i'm ok w either repo. but, to @anuraaga 's point in https://github.com/openzipkin/docker-zipkin/issues/86#issuecomment-528444597 - when should this run? I'm thinking only on master commits (and new tags?), since PRs can't get access to credentials anyways.
Ah was thinking PR at first but the credentials are definitely an issue - master commits SGTM
only benefit of jitpack is branch testing. if this issue has no scope for that than snapshots is fine!
On Fri, Sep 6, 2019, 10:10 AM Anuraag Agrawal notifications@github.com wrote:
I agree jitpack is fine too but any reason not to use the jfrog snapshots?
https://oss.jfrog.org/artifactory/oss-snapshot-local/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-gcp/issues/130?email_source=notifications&email_token=AAAPVV55DEON4UUDEMNZAEDQIG3XZA5CNFSM4IHXPHGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6BPY4A#issuecomment-528678000, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV5P7HBQJOZ25S6D3ZLQIG3XZANCNFSM4IHXPHGA .
From PR comment @anuraaga :
I think a similar server test could be done by using maven project and adding the snapshot repo and dependency on the exec jar. Then you could just call ZipkinServer.main. A lot of the logic here like the massive wait-for-it would be simpler in Java.
What do you think of that approach?
I considered staying w/ the JVM and explored using a JUnit test, or as a IntegrationTest. There were a number of issues that needs to be resolved/thought through:
I feel it's best to test this the way it's meant to be assembled and run. wait-for-it
is a simple script that we don't need to maintain. It's straight forward outside of that and tests the actual runtime conditions (outside of a container).
I'm also thinking - this test only tests against the latest of Zipkin Server. However, if Zipkin server produced a breaking change, and we never really committed any code here - then we'll still miss it.
The only time we are certain of which versions we want to put together is when we produce the Docker container. Would it make sense to remove this test here, and simply to the container test?
Or, it might make more sense to have a completely separate repository w/ only tests of different zipkin server + autoconfigurations.
cc @openzipkin/core
Let's all please never suggest or support zipkin integrations via dependencies. Following that, there's zero point in testing via dependencies except in the core server tree itself.
Those doing gitter support know we are plagued with ad-hoc questions about custom servers via dependencies, sometimes spamming us on multiple channels about custom servers. This happens to the point of us wanting to refactor our entire README strategy. cc @jorgheymans who is trying to help sort this part.
Refresh of the general sort of problems
A. people will get mixed signals, seeing us test an integration eventhough we actively don't support it. B. It hides real integration problems C. it doesn't follow our README here or in any similar repos like zipkin-aws
Anyway please please please don't contribute to our support burden. Test integrated things as documented and supported, don't setup false tests that look like they pass but don't, or setup tests that act as an accidental gateway for people to start asking us to support custom servers.
PS to clarify what's desired imho vs what's not. I like almost exactly what Ray did on the PR because it is simple to understand, with least logic. I really would not like to move testing to another repo. There's a glitch in the PR but otherwise it is right on.
We have so many problems to deal with, above all we need our integration tests to not become new problems or introduce complex repository dependencies that no-one knows how to do.
This follows the history since 2015 where @abesto has always made our shell scripts act as if humans type the commands as sometimes we need to (I certainly do). This has helped avoid disaster multiple occasions and also increases the quickness of resolving problems. For example, if we move this sort of "README testing" into java, it would only increase the complexity, limit people who would want to help to java people, and also not actually test the README, or things that act like it such as our docker build. If nothing else, the history of this repo should teach us this lesson.. the multitude of integration problems with boringssl etc. Let's please KISS
All the reasons make sense to me to stick with the current approach. And yeah I think even better would be to just run the test when building the docker container. I think we can get the current one in for now and keep or remove it later if there's something better. I'll look into the docker test on Wednesday
.. I think even better would be to just run the test when building the docker container.
that's a great idea! the container is built on change to master now, so yeah similar to how we run tests before uploading snapshots, this looks like the right thing to do before master tag
https://github.com/openzipkin/zipkin-gcp/pull/150#issuecomment-532107697
So, I think the autoconfiguration fails against Zipkin master
because the master was updated to use Armeria 0.91. I think it introduced non-backward compatible change.
This is validated by changing zipkin-gcp's armeria.version
to 0.91.0, and the unit tests starts to fail too:
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] StackdriverSpanConsumerTest.accept » NoSuchMethod zipkin2.storage.stackdriver....
[ERROR] StackdriverSpanConsumerTest.verifyCheckReturnsFailureWhenServiceFailsForUnknownReason » NoSuchMethod
[ERROR] StackdriverSpanConsumerTest.verifyCheckReturnsFailureWhenServiceFailsWithKnownGrpcFailure » NoSuchMethod
[ERROR] StackdriverSpanConsumerTest.verifyCheckReturnsOkWhenExpectedValidationFailure » NoSuchMethod
[ERROR] StackdriverSpanConsumerTest.verifyCheckReturnsOkWhenServiceSucceeds » NoSuchMethod
[INFO]
[ERROR] Tests run: 6, Failures: 0, Errors: 5, Skipped: 0
I guess it was useful to know that Zipkin GCP currently doesn't work w/ Zipkin master. But since we don't know what version combination will be assembled by the container, we should defer the test when the combination is known, i.e., when we are building the container.
Indeed it is good news. And now you know why we "super big angry yellow flashing light" say we don't support custom servers :)
should we abandon the server test here and supersede in https://github.com/openzipkin/docker-zipkin/issues/86 ?
you mean only test integration via docker? (ex no separate test integrating with java ...)
@adriancole ah just got back from vacation. i did mean only docker. but i see that you also found testing here to be useful. thnx for updating the PR!
finally got some downtime and cycle to finish this up w/ the docker hub test hook discussed in https://github.com/openzipkin-attic/docker-zipkin/issues/86
docker hub seems to be much harder to test/validate my changes... log entries are not streamed... no log output after 10min :(
@anuraaga everything is merged; but i don't think we have autotest turned on. Want to trigger the test manually and make sure everything works?
@saturnism Nope don't think there was anything wrong with the configuration - looked at the webhook logs in GitHub and don't see anything corresponding to the master commit. Guess GitHub might have had a random blurp so I pushed another mostly no-op commit which is building now - https://cloud.docker.com/u/openzipkin/repository/registry-1.docker.io/openzipkin/zipkin-gcp/builds/2df22c28-c7eb-4155-92ee-bebe135a99c9
Oh good - thanks! Great to see this working. I feel we finished all the tests :D Can finally close now.
@saturnism We had one last question on whether we keep the current Travis Jitpack-based server integration test or not now that we have the docker container one. I personally lean towards removing it since the Docker test is very similar without relying on yet another artifact store. What do you think?
i thought @adriancole wanted to keep it as part of the CI for new code, so that it's caught earlier than container creation, against the master branch of the server. Also, the check on GitHub is slightly more stringent (i.e., it validates the payload response).
The code is already written. Unless there is significant overhead, it doesn't hurt to keep. I'm agnostic in the end though.
I think that was before baking the master builds of the container - now that docker always builds and tests and sends emails on breakage I guess it's enough. But true that it's already code that's been written, my main concern is using Jitpack. Let's leave it and if it ever flakes even once delete it then :)
Add automated testing for zipkin-gcp against a stackdriver project.