paketo-buildpacks / java

A Cloud Native Buildpack with an order definition suitable for Java applications
Apache License 2.0
110 stars 24 forks source link

Add initial set of integration tests #1399

Closed c0d1ngm0nk3y closed 3 weeks ago

c0d1ngm0nk3y commented 1 month ago

Summary

Follow-up of https://github.com/paketo-buildpacks/java/pull/828 since this got merged and reverted by accident, see comment

As briefly discussed on Slack with @dmikusa, this is an initial set of integration tests run for each PR (and main). It mainly should trigger discussions around a testing strategy. The currently active tests add ~142s to the build time.

The TomEE tests are currently disabled due to https://github.com/paketo-buildpacks/apache-tomee/pull/96, which we discovered while working on the integration tests.

Use Cases

Checklist

CC @loewenstein

linux-foundation-easycla[bot] commented 1 month ago

CLA Signed

The committers listed above are authorized under a signed CLA.

stefanlay commented 1 month ago

/easycla

anthonydahanne commented 1 month ago

hello! what's the status on this PR? It looks like the new IT tests are failing?

Also, did you consider not adding java projects, but rather re using the ones found in https://github.com/paketo-buildpacks/samples/tree/main/java? Those sample projects are actually also regularly run in GH actions.

Also, I guess the tomee bug was solved, so maybe you could uncomment its test. Thanks!

c0d1ngm0nk3y commented 1 month ago

what's the status on this PR?

I will have a look. This pr ha quite a long history and is a follow-up on this comment.

It looks like the new IT tests are failing?

I will rerun them locally, but I have a arm machine and always a hard time to get any integration tests to actually work :(

Also, did you consider not adding java projects, but rather re using the ones found in https://github.com/paketo-buildpacks/samples/tree/main/java? Those sample projects are actually also regularly run in GH actions.

I like the idea. Once the tests are passing, I should be able to "just" exchange the apps.

Also, I guess the tomee bug was solved, so maybe you could uncomment its test. Thanks!

This test won't work for arm due to the missing dependencies. I will use BP_ARCH to force amd and it should also run on a arm macbook.

dmikusa commented 1 month ago

Also, did you consider not adding java projects, but rather re using the ones found in https://github.com/paketo-buildpacks/samples/tree/main/java? Those sample projects are actually also regularly run in GH actions. I like the idea. Once the tests are passing, I should be able to "just" exchange the apps.

+1 I don't think we shouldn't add any new sample/test app code to this repo.

Unless there's a really good reason, I think we should checkout the samples repo and use code from that project here.

Adding sample/example Java apps code here creates more code we need to maintain, and there should be a lot of overlap between the two.

c0d1ngm0nk3y commented 1 month ago

+1 I don't think we shouldn't add any new sample/test app code to this repo.

So you would suggest to use a git submodule? Or just copy the same app.

The nodejs buildpack doesn't use the samples. Isn't that the same? It feels weird to not have any integration tests for a metabuildpack, doesn't it?

dmikusa commented 1 month ago

I think we should use - uses: actions/checkout@v4 to checkout the Samples repository. Then we have a full suite of samples. I don't see a reason to use a sub-module. When these tests run, we should be in a place where all of the latest buildpacks and all of the latest samples work correctly. If they don't, that's a problem and we should fix it.

c0d1ngm0nk3y commented 1 month ago

@dmikusa @anthonydahanne Regarding reusing the samples, we see 2 main "problems":

What exactly to you expect for effort in adding specific test applications. From our experience, you only touch such test apps if you want to test something more specific.

dmikusa commented 1 month ago

Local execution of the integration tests. Ideally, each repository should be self-contained. Otherwise, you have to checkout samples in order to test java-buildpack. This would even mean that you can break the java-buildpack with some change in the samples repo. Sounds sub optimal.

We are optimizing to run this in CI not locally, so having the slight inconvenience of checking out the samples first is a compromise I'm fine with making. We also change the samples very infrequently so I'm willing to roll the dice on that unintended breakage there too.

tl;dr I'm Ok with these trade-offs. What do you think @anthonydahanne and @pivotal-david-osullivan ?

Another concern is that samples and tests are 2 different things. The samples might be good smoke tests, but when testing more than "it works" (e.g. the runtime used), the code is not a good match for the samples.

We should probably level-set on the intent of the tests here.

To me, the intent is more smoke tests than anything else. Especially for a first effort.

I want something that runs in a reasonable amount of time and gives us confidence that the important things are working correctly. If we run this way for a while and find we're still seeing bugs that make it through, then we can look at expanding things, but you can't go in the opposite direction. Once something is added to the test suite, it's effectively going to live there forever. So if we add a bunch of stuff that's not really necessary, it's just cruft we'll carry on forever. For that reason, I'd very much like to start small and start with smoke tests for the important functionality. Then see where that gets us.

I think the samples repo provides some good apps to do this. Most of the samples provide some minimal output, which is enough to verify that the app has started and is running.

I'm not sure that we need a 1-to-1 parity with every sample, nor do I think that would give us good test coverage. Off the top of my head, I think these are important use cases that we want to cover.

I'm sure I'm missing some test cases, so treat this as a starting point not a complete list. What do you think, what did I miss @anthonydahanne and @pivotal-david-osullivan ?

Other things I think we can do to simplify and reduce the time it will take to run the test suite:

anthonydahanne commented 1 month ago

We are optimizing to run this in CI not locally, so having the slight inconvenience of checking out the samples first is a compromise I'm fine with making. We also change the samples very infrequently so I'm willing to roll the dice on that unintended breakage there too. tl;dr I'm Ok with these trade-offs. What do you think @anthonydahanne and @pivotal-david-osullivan ?

Yes, I prefer having 1 single set of Java projects; call them "samples" or "examples" or "Projects for ITs", it does not matter; they're just used for both users who want to try out buildpacks and for asserting we will not break our users with new buildpacks. And yes, since they'll run from CI 99% of the time, I have no issue with them coming from another repo; with a preference for GHA doing the external checkout, better than some git sub modules

What do you think, what did I miss @anthonydahanne and @pivotal-david-osullivan ? Nothing important I guess: war, spring boot, quarkus, native - looks good to me

loewenstein-sap commented 1 month ago

Local execution of the integration tests. Ideally, each repository should be self-contained. Otherwise, you have to checkout samples in order to test java-buildpack. This would even mean that you can break the java-buildpack with some change in the samples repo. Sounds sub optimal.

We are optimizing to run this in CI not locally, so having the slight inconvenience of checking out the samples first is a compromise I'm fine with making. We also change the samples very infrequently so I'm willing to roll the dice on that unintended breakage there too.

tl;dr I'm Ok with these trade-offs. What do you think @anthonydahanne and @pivotal-david-osullivan ?

Optimized for CI is probably fine, as long as we make sure it is clear and hopefully still easy to execute them locally. If I understood the discussion around https://github.com/paketo-buildpacks/executable-jar/pull/265, https://github.com/paketo-buildpacks/executable-jar/issues/264 and https://github.com/paketo-buildpacks/executable-jar/issues/132 correctly the main concern there was the amount of effort to execute integration tests manually.

Another concern is that samples and tests are 2 different things. The samples might be good smoke tests, but when testing more than "it works" (e.g. the runtime used), the code is not a good match for the samples.

We should probably level-set on the intent of the tests here.

To me, the intent is more smoke tests than anything else. Especially for a first effort.

Actually, for me the intent is to be able to merge bug fixes without including unrelated feature requests.

I can see how we separate this into an initial effort to establish rudimentary integration tests (aka smoke tests) and then expand on those. But eventually, we should have an automated integration test suite like for example the Paketo Node buildpack has as well and I would second @c0d1ngm0nk3y that this likely needs specific test data like again the Paketo Node buildpack shows as well. I'd also say that the maintenance effort for test data will be related to necessary changes to the tests - in which case the efforts seem to be justified.

I want something that runs in a reasonable amount of time and gives us confidence that the important things are working correctly. If we run this way for a while and find we're still seeing bugs that make it through, then we can look at expanding things, but you can't go in the opposite direction. Once something is added to the test suite, it's effectively going to live there forever. So if we add a bunch of stuff that's not really necessary, it's just cruft we'll carry on forever. For that reason, I'd very much like to start small and start with smoke tests for the important functionality. Then see where that gets us.

I could easily see that we separate smoke from more detailed tests and allow the execution of just the smoke tests. Then we could have a single pre-release execution of the full suite for example and with that cover also more complicated changes to the build plan without the need for additional manual tests.

With the right documentation about local execution, it would also be easy to run the full suite in a PR like https://github.com/paketo-buildpacks/executable-jar/pull/265 prior to merging.

I think the samples repo provides some good apps to do this. Most of the samples provide some minimal output, which is enough to verify that the app has started and is running.

I'm not sure that we need a 1-to-1 parity with every sample, nor do I think that would give us good test coverage. Off the top of my head, I think these are important use cases that we want to cover.

  • Maven build Spring Boot executable JAR app from source
  • Gradle build Spring Boot executable JAR app from source
  • Build a container from a precompiled executable JAR app
  • Maven or Gradle (only one is necessary, IMHO) build a WAR file from source with Tomcat
  • Build a container from a precompiled WAR file with Tomcat
  • Build a container from a precompiled WAR file with Liberty (or Tomee, not sure we need to test every single one, just that it's possible to switch in a different app server)
  • Maven or Gradle (only one is necessary, IMHO) build Spring Boot executable JAR app into a Native Image binary from source
  • Build a container from a precompiled executable JAR into a Native Image binary
  • Build a Quarkus app from source that runs with a JVM
  • Build a Quarkus app from source that runs as a native image binary

This list sounds like a clear indicator to use dedicated test data, locally to the integration test suite and keep the samples as examples - actually not having to worry if a change in the examples will collide with our goals for test automation. Additionally, it might be a good idea to run all samples - with the goal of making sure we do not break the example experience with any change we release.

Other things I think we can do to simplify and reduce the time it will take to run the test suite:

  • I don't think we need to test on every stack/builder. We recommend and suggest folks use the tiny stack, so we can do our tests on that stack/builder.

This sounds like a fair assumption.

  • I don't think we need to build every buildpack from source. I see these tests being run before we want to cut releases of the composite buildpacks, so I think the tests should pull down the latest existing buildpack for every buildpack in the composite, and then package a new composite from those images. This means we can release and cut component buildpacks whenever they are ready and we can then run these tests to verify what we have staged to be released.

Actually, allowing local test execution will require some means of control to test a local version easily. We can still also make testing the latest released component buildpacks easy and even the default.

WDYT @paketo-buildpacks/java-maintainers @paketo-buildpacks/java-contributors, can we continue with the set of test data as an initial integration test suite and the evolve this to cover your full list above? We strongly believe that local test data is

  1. required for a comprehensive integration test suite and
  2. not a big additional maintenance burden because it only requires efforts if tests need to be changed or added

Should we additionally add a smoke suite to the samples repository to make sure releases don't break? I guess this would need coordination with other maintainers (to cover all language examples) though.

dmikusa commented 1 month ago

I can see how we separate this into an initial effort to establish rudimentary integration tests (aka smoke tests) and then expand on those. But eventually, we should have an automated integration test suite like for example the Paketo Node buildpack has as well

I can't comment on the Node testing situation because I don't use or maintain them and am not familiar with them.

I don't really care what we call the tests. The issue that is important to me is what we are testing. There is a tangible cost for running a single integration test with buildpacks. There's a lot to download, build, etc... That takes time, energy and compute minutes/simultaneous processes on GHA.

Even with heavy caching of those resources, which is critical IMHO (see what was done with the samples repo), we need to be careful about what we put into this repo or we'll very quickly have a test suite that takes way too long to run and plugs up our GHAs (we only have 10 concurrent tasks).

I used smoke tests to describe this because I think we need to focus on testing the most important use cases first and then gradually and deliberately expand to other areas, perhaps ones that are high-risk/fragile or highly utilized by our end users.

Put in another way, I don't see this being successful if we try to make it a thorough and complete test of everything the Java buildpacks can do. There's just too much that it can do. It would result in too many different test permutations. The test suite would end up taking hours to run, and that's not effective or practical with our resources.

What exactly goes in is kind of a gray area. Maybe over time, there are guidelines we establish, after we have some experience with this. For now, I think there's room for and I would like to see discussions around what exactly fits into the test suite, because if we're having those discussions then I think we'll come out with a meaningful and helpful test suite.

I could easily see that we separate smoke from more detailed tests and allow the execution of just the smoke tests.

I agree that this could be helpful, but not out of the gate. To me, this is something we do after we've accumulated enough tests that it's not feasible to run them every time. Then I could see splitting them up. I don't think this is a way to make "test everything" successful. We still need to lean heavily on unit tests and test as much as we can there. Then we can keep the integration tests for things that simply cannot be simulated with unit tests, like our complicated and fragile build plan.

This list sounds like a clear indicator to use dedicated test data, locally to the integration test suite and keep the samples as examples - actually not having to worry if a change in the examples will collide with our goals for test automation.

Sorry, I don't follow the logic here. There are existing samples we could cleanly use to test all of these cases. The code is there and ready to be used, unchanged. I don't see why we would not want to use them. Why write new tests that do the same thing?

Changes in the samples are not going to collide with anything because the sample apps do almost nothing. These are not full-featured Spring PetClinic style samples. These are hello-world samples. The code is very simple and unlikely to change aside from dependency updates. I don't see what the tests would be doing that would cause collisions. The tests are going to be doing very similar things to the sample. Build a particular app with a particular set of configurations.

WDYT @paketo-buildpacks/java-maintainers @paketo-buildpacks/java-contributors, can we continue with the set of test data as an initial integration test suite and the evolve this to cover your full list above? We strongly believe that local test data is

I can't speak for my co-maintainers, but my vote is that we use the samples. I don't want to pull in additional tests for reasons previously explained.

Should we additionally add a smoke suite to the samples repository to make sure releases don't break? I guess this would need coordination with other maintainers (to cover all language examples) though.

Not sure I follow, we already have a test suite for the samples.

loewenstein-sap commented 1 month ago

I don't really care what we call the tests. The issue that is important to me is what we are testing. There is a tangible cost for running a single integration test with buildpacks. There's a lot to download, build, etc... That takes time, energy and compute minutes/simultaneous processes on GHA.

Even with heavy caching of those resources, which is critical IMHO (see what was done with the samples repo), we need to be careful about what we put into this repo or we'll very quickly have a test suite that takes way too long to run and plugs up our GHAs (we only have 10 concurrent tasks).

I used smoke tests to describe this because I think we need to focus on testing the most important use cases first and then gradually and deliberately expand to other areas, perhaps ones that are high-risk/fragile or highly utilized by our end users.

Put in another way, I don't see this being successful if we try to make it a thorough and complete test of everything the Java buildpacks can do. There's just too much that it can do. It would result in too many different test permutations. The test suite would end up taking hours to run, and that's not effective or practical with our resources.

What exactly goes in is kind of a gray area. Maybe over time, there are guidelines we establish, after we have some experience with this. For now, I think there's room for and I would like to see discussions around what exactly fits into the test suite, because if we're having those discussions then I think we'll come out with a meaningful and helpful test suite.

Yes of course. Integration tests should test integration aspects - whatever can be tested in a unit test should be. Also I don't care much either how the tests are called - my goal is to reach a test coverage that is sufficient to merge pull requests like https://github.com/paketo-buildpacks/executable-jar/pull/265 without manual efforts that prevent or delay a merge. I am not exactly sure what needs to be tested to achieve that though? Is your list that?

  • Maven build Spring Boot executable JAR app from source
  • Gradle build Spring Boot executable JAR app from source
  • Build a container from a precompiled executable JAR app
  • Maven or Gradle (only one is necessary, IMHO) build a WAR file from source with Tomcat
  • Build a container from a precompiled WAR file with Tomcat
  • Build a container from a precompiled WAR file with Liberty (or Tomee, not sure we need to test every single one, just that it's possible to switch in a different app server)
  • Maven or Gradle (only one is necessary, IMHO) build Spring Boot executable JAR app into a Native Image binary from source
  • Build a container from a precompiled executable JAR into a Native Image binary
  • Build a Quarkus app from source that runs with a JVM
  • Build a Quarkus app from source that runs as a native image binary

Is that covered by using the samples as test data?

I wonder if we are prepared to add a sample every time we discover a new combination that needs an integration test. If you all feel like this will be fine, then starting with the samples and learning from there is fine.

I could easily see that we separate smoke from more detailed tests and allow the execution of just the smoke tests.

I agree that this could be helpful, but not out of the gate. To me, this is something we do after we've accumulated enough tests that it's not feasible to run them every time. Then I could see splitting them up. I don't think this is a way to make "test everything" successful. We still need to lean heavily on unit tests and test as much as we can there. Then we can keep the integration tests for things that simply cannot be simulated with unit tests, like our complicated and fragile build plan.

Sure. I brought it up in order to prevent "more tests" to be too controversial.

This list sounds like a clear indicator to use dedicated test data, locally to the integration test suite and keep the samples as examples - actually not having to worry if a change in the examples will collide with our goals for test automation.

Sorry, I don't follow the logic here. There are existing samples we could cleanly use to test all of these cases. The code is there and ready to be used, unchanged. I don't see why we would not want to use them. Why write new tests that do the same thing?

Changes in the samples are not going to collide with anything because the sample apps do almost nothing. These are not full-featured Spring PetClinic style samples. These are hello-world samples. The code is very simple and unlikely to change aside from dependency updates. I don't see what the tests would be doing that would cause collisions. The tests are going to be doing very similar things to the sample. Build a particular app with a particular set of configurations.

I am just a little worried from own experience that coupling the goal of testing something specific and the goal of providing usage examples has collided for me in the past.

WDYT @paketo-buildpacks/java-maintainers @paketo-buildpacks/java-contributors, can we continue with the set of test data as an initial integration test suite and the evolve this to cover your full list above? We strongly believe that local test data is

I can't speak for my co-maintainers, but my vote is that we use the samples. I don't want to pull in additional tests for reasons previously explained.

Should we additionally add a smoke suite to the samples repository to make sure releases don't break? I guess this would need coordination with other maintainers (to cover all language examples) though.

Not sure I follow, we already have a test suite for the samples.

If we already have a suite for the samples, what would the integration tests then add?

dmikusa commented 1 month ago

If we already have a suite for the samples, what would the integration tests then add?

The main problem with the samples is that they don't automatically run until after we update the builder and by that time it's too late to prevent problems. The problem's already been released.

My number one goal for this, and it will absolutely help us work toward the goal you mentioned of helping to reduce manual testing effort, is to have these tests run before we cut releases of the Java composite buildpacks.

I think what we have in the PR meets that. It's going to run them A LOT which I think is going to create some pain, but I also think that will hopefully spur us to make some pipeline changes that will end up improving things overall (like grouping buildpack update PRs into one single PR). Or if that doesn't end up being feasible, we can always adjust this later.

name: Tests
"on":
    merge_group:
        types:
            - checks_requested
        branches:
            - main
    pull_request: {}
    push:
        branches:
            - main

The other problem with the samples is that they're really slow. I think it was taking close to 30 minutes to run. We ended up having to break it down so that it doesn't run the full suite anymore so that we could practically manage changes to the repo. I think that's fine for samples, just test what's changed. That's less helpful when you're talking about an integration test suite for the entire set of buildpacks. Especially with build plan changes, a change in one buildpack can break the build plan in scenarios that you might not expect, so you really need to test it all. I'd like to see something that runs in less than 5 minutes. I think that's probably aggressive, but it really puts the test suite in a place where it's easy to run and get feedback reasonably fast.

The apps there are good though. They have pretty good coverage and I think we can reuse them here. Some of the apps like the Maven or Gradle samples can probably be used for multiple test cases. At the same time, I think there are examples that we don't need to bother testing, like the aspectj example. It's cool, nothing against the technology, but there's nothing really unique about the way it works at the buildpack level so running it as a test doesn't provide us with a lot of test value.

Is your list that?

Maven build Spring Boot executable JAR app from source Gradle build Spring Boot executable JAR app from source Build a container from a precompiled executable JAR app Maven or Gradle (only one is necessary, IMHO) build a WAR file from source with Tomcat Build a container from a precompiled WAR file with Tomcat Build a container from a precompiled WAR file with Liberty (or Tomee, not sure we need to test every single one, just that it's possible to switch in a different app server) Maven or Gradle (only one is necessary, IMHO) build Spring Boot executable JAR app into a Native Image binary from source Build a container from a precompiled executable JAR into a Native Image binary Build a Quarkus app from source that runs with a JVM Build a Quarkus app from source that runs as a native image binary

Yes. If we had test examples for these cases, it would get us pretty far. In particular, this is going to cover a lot of the common build plan scenarios which is where we need to improve the most (IMHO). If we had these, I think it'd be enough coverage to merge.

You can leave out the native-image tests for this repo. I think we should put those into a similar set of tests under paketo-buildpacks/java-native-image.

pbusko commented 1 month ago

@dmikusa the test suite has been updated to cover mentioned cases, except for the native images (includes Quarkus)

loewenstein-sap commented 3 weeks ago

cc @paketo-buildpacks/java-maintainers

anthonydahanne commented 3 weeks ago

@dmikusa all good for you? I've run the tests locally, I believe it's a good addition