microsoft / botframework-sdk

Bot Framework provides the most comprehensive experience for building conversation applications.
MIT License
7.5k stars 2.44k forks source link

1) Replace record/playback tests to mocks 2) Add LOTS of nightly functional tests #5977

Closed mdrichardson closed 6 months ago

mdrichardson commented 4 years ago

Test Adjustments

This proposal covers two related issues:

  1. Converting Record and Playback-style tests in the JS SDK to mocks, and
  2. Creating functional tests for many of our mocked tests

Table of Contents

Current Situation

Record and Playback

Many of the functional tests in the JavaScript SDK rely on Record and Playback style of testing instead of the more traditional mocks. This is problematic because:

  1. It adds additional complexity to the testing codebase, essentially creating a new testing framework that devs need to learn.
  2. It adds additional test recording files to PRs.
  3. Reading test results from files opens an additional avenue for tests that are less deterministic than just using mocks.

Functional Tests

Additionally, where we have mocks or Record and Playback tests (in all SDKs), there is potential the tests to become stale. I've fixed a couple of small issues where this has occurred with both Cosmos and Connector.

Proposed Solution

Converting the JavaScript tests from Record and Playback to mocks is "easy" enough and doesn't need much design guidance--just confirmation that it should be done.

Relevant Libraries and Features

However, we have the following libraries/features that have tests with mocks that could be added to nightly functional tests:

  • [ ] CosmosDbPartitioned
  • [ ] CosmosDb (non-partitioned)
  • [ ] AzureBlob
  • [ ] LUIS
  • [ ] QnA
  • [ ] Adapters (Facebook, Slack, etc)
  • [ ] ApplicationInsights
  • [ ] BotFrameworkHttpAdapter
  • [ ] BotFrameworkHttpClient
  • [ ] SkillHttpClient
  • [ ] Adaptive Dialogs HTTP Actions
  • [ ] Streaming

There are two ways that we can approach this:

  1. Use existing mocked tests and programmatically switch between live/mocked/skipped modes based on presence of environment variables (AppId, AccessKey, etc), or
  2. Write separate functional tests

[Option 1]: Programmatic

For the items that we want nightly functional tests for, we need to design the tests such that running them live can be easily enabled. This could be addressed with the following design (this discussion will be based around .NET, but will apply in JS/Python to the extent possible):

  1. Each relevant test file will have a WillRunLive parameter, which is lazily set by determining whether or not certain environment variables are present.
    1. The required environment variables will be determined by each test file, but will generally be something like an AppId or access key.
    2. Once WillRunLive is determined, there will be logger output that specifies which way the test is running (Live vs. Mocked vs. Skipped)
  2. The "Assemble" section of each unit test will be wrapped by a call to WillRunLive:
    1. If this results in "Skip", Assert.Ignore()
    2. If this results in "Mocked", mocks are set up
    3. If this results in "Live", the mock setup is skipped

Pros:

  • Keeps all tests within the same file
  • No re-writing of tests between mocked and functional

Cons:

  • Developers would need to "learn" how to switch between modes when testing locally

[Option 2]: Separate Files

We could fairly easily break mocked tests out into separate files for nightly functional tests with mostly a copy/paste and then removal of mocks.

Pros:

  • Separation of test purposes
  • Easy to implement

Cons:

  • Multiple places to write tests for the same component

Triggering Functional Tests

Nightly

Yes.

PRs

We could optionally add automation to trigger live functional tests on the relevant feature/library when certain GitHub tags get added. We could have these be added, either when:

  1. The PR gets submitted/changed (this would run fairly frequently)
  2. The PR gets an approval that enables (this would then run once per PR as a sort of final check that the PR is good)
    1. Note: I'm not sure how difficult this one would be

Decisions to Make

  • [ ] Do we convert all Record and Playback tests to use Mocks?
  • [ ] What libraries/features do we add nightly function testing to, if any?
    • [ ] Do we add this programmatically to existing tests, or move functional tests into separate files?
    • [ ] How do we want these tests triggered in GitHub, if at all?
johnataylor commented 4 years ago

Thanks for writing this up @mdrichardson .

Regarding when we change. I think this is case-by-case. The principle is: favor cleanly structured, clear regular mocha tests over clever-clever record and playback proprietary frameworks. A unit test suite should read as a (flat) set of asserts about the expected behavior.

And it would be great to have more overnight functional tests against the real systems. This is basically the approach we started with the Teams integration. I like the idea of additional triggers for functional test runs - for example if there is a change in the Azure package then run the functional tests for that PR - if there isn't don't - knowing we'll find anything in the overnight if there is an indirect problem.

I favor separate functional tests over the modal idea.

Its very important to embrace the underlying test framework and try to build as little in the way of additional mechanism as possible. For example, I opened an issue on the adaptive expression tests. These were not actually record and playback, but they still introduced an additional mechanism over the basic mocha structure. That is not good, we should always endeavor to use the underlying test framework in as straightforward a manner as possible.

mdrichardson commented 4 years ago

@johnataylor

Regarding when we change. I think this is case-by-case.

Can you expand on what you mean by this? My "plan" was to figure out which libraries/features we want this to apply to, then do it in Cosmos, first. Once that one is implemented perfectly, implement it for everything else the same way.

I favor separate functional tests over the modal idea.

Makes sense. Let's say that we have some tests (like Cosmos) that currently has some functional tests already written, but that are either mocked or skipped in CI. Does it make more sense to:

  1. Copy those tests into functional tests and leave the originals where they are, so they're easily found by developers writing related code so they can test against it, or
  2. Remove the mocked/skipped tests from their original location and place them under /FunctionTests where they're harder to find, but we avoid maintaining duplicate code?

I definitely favor #2 for tests that are skipped in CI, anyway. I'm less sure about moving mocked tests from other libraries; I think maybe this is where the case-by-case basis comes in.

I'm happy to make this all happen, whichever way we go. Ideally, we decide the case-by-case items ahead of time so that all of the adjustments go in one at a time, but all within one cycle (vs. having to re-convene between every PR).

mdrichardson commented 4 years ago

@johnataylor @cleemullins @mrivera-ms @stevengum

Do we have a direction we want to go with this? It's related to this issue which is tagged for R11, but I've been holding off on tackling it since any new tests will likely use some implementation of nock.

stevengum commented 4 years ago

Let's loop @joshgummersall in on this discussion. At the moment it seems like a spike/implementation will be done in botbuilder-js first, so he might have some valuable insight/pointers on JS/TS-specific implementations.

A valuable first pass might be setting something up for fetching tokens from AAD (for AppCredentials). There's only one HTTP request per AppCredentials subclass and it's a building block for the Conversations, Attachments and Token Service functional tests.

I'll defer to @johnataylor's and @joshgummersall's input, but option 2 or some variant of it sound reasonable to me.


As an aside, we'll have to update the code coverage reports/pipelines to include functional tests.

joshgummersall commented 4 years ago

I think I'm definitely in favor of moving to mocks over nocks where we can as long as we have some backup functional testing.

I think my preference, in order, is:

  1. Sinon (with sandboxing and sandbox.verify())
  2. Nock for programmatically expressing HTTP mocks (the record and playback functionality, while useful, tends to obscure the mocks (i.e. what input parameters to a function end up affecting the mocked request)
  3. Custom mocks/nocks/etc

This is all with the implicit understanding that we should have functional, end-to-end tests that exercise the real API endpoints to protect against drift.

mdrichardson commented 4 years ago

@joshgummersall That all makes sense to me. It seems like the best order tackle this might be something like:

  1. Get all of the functional tests we want
    • Is there something already being done here? I believe at some point during R11 there was some talk about a big push here, but I'm not in those circles.
  2. Go library by library and convert nock tests to sinon.
    • If there's any live tests, verify that they're covered by functional tests before converting to sinon. If they're not already covered, take them case-by-case and decide whether to keep them live, convert to mock, or both move to functional tests and create a mocked version
joshgummersall commented 4 years ago

I think that all sounds reasonable. I'll have a small PR with some common testing/Sinon/sandboxing functionality in the next day or two so perhaps hold off on rewriting anything with Sinon for now.