microsoft / vscode

Visual Studio Code

https://code.visualstudio.com

MIT License

162.22k stars 28.55k forks source link

Testing in VS Code #107467

Closed connor4312 closed 3 years ago

connor4312 commented 3 years ago

State of the World

Testing support in VS Code has been a feature request for a long time. The VS Code community has build excellent extensions around testing, for example:

The Test Explorer UI from @hbenl
Wallaby.js from the Wallaby team
Jest from @orta
...and many more

Each implementation of testing presents a different set of features, UI, and idiomaticity. Because there is no sanctioned approach to tests in VS Code, extension developers tend to make bespoke implementations, as we've seen in the Python and Java language extensions. Ideally, like in debugging, a VS Code user would have just about the same experience as they work between projects and languages.

VS Code's Approach

Investigate how VS Code can improve the testing support. Several extensions are already providing testing support, explore what APIs/UIs could be added to improve these testing extensions and the test running experience. -- 2020 Roadmap

The Test Explorer UI presents the best point of inspiration for us, as there are many existing extensions built on its API: it's capable and proven. Regardless of the direction we take in VS Code, we should have a way for its Test Adapters to be upgraded to the new world.

Wallaby is an excellent extension, but it's tailored and purpose-built to JavaScript, and includes functionality which is not readily portable to other languages. While it is a good source for inspiration, we're not aiming to encompass Wallaby's feature set in the extension points we provide, at least not yet.

We're prototyping an API in the extension host, but there are a number of approaches we can take:

Extension Host ('traditional' VS Code API)	'Test Protocol' (like DAP/LSP)	Extension (like existing test explorer)
+ Simple to adopt for extension authors + Easier to manage state + Clear way to build 'official' test extensions	+ Encourages keeping expensive work in child processes + Could be theoretically shared with VS and other editors	+ Keep VS Code core slim + Unclear whether there's significant functionality we'd want that's not already possible in exthost api
- The 'obvious path' is doing heavy lifting in the extension host process, which is undesirable	- Additional implementation and maintainence complexity for VS Code - Less friendly, additional complexity than TS APIs for extension authors	- Additional extension and set of libraries to maintain+version for types and implementation - Less clear there's an official pathway for test extensions

API Design

The following is a working draft of an API design. It should not be considered final, or anything close to final. This post will be edited as it evolves.

Changes versus the Test Adapter API

As mentioned, the test adapter API and this one provide a similar end user experience. Here are the notable changes we made:

The test adapter API does not distinguish between watching a workspace and watching a file. In some cases, there is an existing process that reads workspace tests (such as a language server in Java) or it's not much more expensive to get workspace tests than file tests (such as mocha, perhaps). However, some cases, like Go, providing tests for a single file can be done very cheaply and efficiently without needing to involve the workspace.

In this API we expect the TestProvider to, after activation, always provide tests for the visible text editors, and we only request tests for the entire workspace when required (i.e. when the UI needs to enumerate them).
We have modeled the test state more closely after the existing DiagnosticCollection, where the Test Adapter API uses only events to enumerate tests and does not have a central collection.
The Test Adapter API makes the distinction between suites and tests, we do not. They have almost identical capabilities, and in at least one scenario the 'suites' are more like tests and the leaf 'tests' cannot be run individually.
We use object identity rather than ID for referencing tests. This is in line with other items in the VS Code API, including Diagnostics.

Ideas and Open Questions

See the testing label for current work, questions, and problems.

API

See the current working proposal in https://github.com/microsoft/vscode/blob/master/src/vs/vscode.proposed.d.ts (ctrl+f for 107467)

connorshea commented 3 years ago

👋 I'm the maintainer of the Ruby Test Adapter extension and just wanted to drop in and give a bit of context on the pain points I've had with the Test Adapter API. The Adapter API works well for the most part, but the biggest problem I've had with it is getting it work well with large codebases.

I developed the Ruby adapter with my Rails app vglist in mind, and at the time it had around 300-400 tests in total (it now has 700-800). My adapter worked fine with that, it took a few seconds (maybe 5-10) to reload all the test suite data whenever a change was made to the test code. However, even 5-10 seconds is annoying to wait for when you just want to run your tests right after you modify them. And now I work at a company with a Rails app that has over 12,000 RSpec tests, so my extension now takes about 45 seconds to reload the tests whenever I change them (and it'd be even longer if I didn't have a pretty well-spec'd laptop). I don't ever really use my test adapter with that codebase because it slows me down far too much.

The problem is that the API doesn't support modifying the existing set of tests, only replacing all of them at once. Which requires RSpec to generate information for every test in the codebase whenever it reloads the tests.

I've considered a workaround where I'd cache the information in a tempfile or something, and have rspec only load information about changed files, and then modify the cached tree and send it up to the explorer UI, but that's complex and I haven't gotten around to actually doing it.

There are other problems - which I'm not really sure how to solve - with being unable to run the tests while it's loading the test suite because it doesn't handle the tests that pass while it's loading (the data can theoretically get out-of-sync if you, say, change a bunch of things between reloads, but >99% of the tests in the codebase will be exactly the same before and after any given change I make to the suite). So it's maybe worth considering that when designing the invalidation logic.

There are also problems with being able to view test logs to figure out issues when tests fail, but that's more of a problem with how the RSpec framework works than with anything the Test Adapter API does.

The current API Sketch looks quite good from what I'm seeing. I especially like the use of a cancellation token, which if I understand correctly seems like a much cleaner solution than the Adapter API where you just blindly attempt to kill any relevant running processes. There are some things missing there (e.g. how do I handle the case where the suite is unloadable because the user has created an invalid test suite via some syntax error?), but overall it looks like it takes a lot of the good decisions from the Adapter extension API.

Anyway, I hope this makes some sense. I understand it's probably not a super cohesive comment since I wrote it on-the-fly, but I've been hoping something like this would be introduced into the core of VS Code eventually, so I wanted to make sure I got my two cents in :)

connor4312 commented 3 years ago

Thank you for that feedback, it's super helpful!

Scoping tests and handling large codebases is definitely something we want to tackle from the outset. Like all features we build, we will be dogfooding it in the VS Code repo, so it is a priority to be able to handle that. As you saw in the sketch, I think the collection approach solves the pain points around having to replace all tests.

The idea of running tests during load is interesting. The base case of running all tests is pretty simple from the API perspective -- we would just runTests without any filters and let the extension run as many as it can. So long as it's able to reconcile the collection into the right state running concurrent discovery and execution, everything should be fine.

One scenario that isn't covered is running a subset of tests as discovery is happening (e.g. if I wanted to run all tests with the word "foo" in them) without starting a new test run. This would take some finessing and will not be supported by all adapters, for instance Go could never support it without changes to its toolchain, so is probably not worth designing for...

There are some things missing there (e.g. how do I handle the case where the suite is unloadable because the user has created an invalid test suite via some syntax error?),

I think the right thing to do would be for the test extension to emit a diagnostic for that using our existing API. I've added it to the open questions as well though for discussion.

connorshea commented 3 years ago

@connor4312 generally speaking I'd want to give the user an error notification so they know something has broken and their suite failed to load, rather than creating a diagnostic. But as long as the extension can choose what to do if that occurs, I suppose the Test API in VS Code doesn't need to care about that.

connor4312 commented 3 years ago

When I say diagnostic I mean the "Diagnostic" in the VS Code API that appears in the problems view (and can also have an error squiggle in the editor)

The downside there is that if they have a syntax error, having the problem be duplicated by both the language server and test extension will be some ugly clutter.

connorshea commented 3 years ago

The reason I wouldn't use a diagnostic is that 1) I personally rarely ever look at diagnostics/errors in the diagnostics view because the signal to noise ratio in my experience is so high, and 2) there are situations where a diagnostic wouldn't really make sense, for example if the user's tests fail to load because their local Postgres server isn't running or because they forgot to install their Ruby gems so the RSpec gem isn't even available.

hbenl commented 3 years ago

Some thoughts from my side (I'm the author of the Test Explorer UI extension):

the proposed API looks good, in particular having a central test collection will probably be welcomed by many TestProvider authors. This is obviously necessary if you want to support only detecting tests in the currently opened files (which I had never thought about) but even without that there seem to be some scenarios where TestAdapter authors struggle with this part of the Test Explorer API
bringing the existing TestAdapters to the new world could be done with a bridge extension (or built into Test Explorer itself) which registers all the tests coming from TestAdapters with the new TestProvider API
concerning the question whether a TestProvider should run in the Extension Host: I'm not worried about expensive work being done in the Extension Host process here: the most expensive part is usually running the tests and TestProviders should always do that in a child process anyway
perhaps some other TestAdapter authors would also like to share their thoughts: @Raagh @kavod-io @Gwenio @Testy @marcellourbani @numaru @bneumann @Florin-Popescu @matepek @fredericbonnet @drleq @dampsoft @betwo @zcoinofficial @vshaxe @kondratyev-nv @recca0120 @DEVSENSE @swellaby @prash-wghats @Derivitec @Bochlin @maziac

bneumann commented 3 years ago

Thanks for the mention @hbenl. Looks good to me, I also agree with @connorshea about the notificantion in case something is broken. I had that a lot and having a possibility to tell the users what went wrong might be good.

One thing that I was missing, and I am not sure if it possible, is to match my tests with the source files. My adapter runs C code and I can't simply grep all files for the function names. I could do that if the user registered the toolchain somehow but that would require a lot of setup for the user to make prior to testing. So it would be nice to somehow get cross information from the language server and check which files in my project are actually test files.

Not sure if that is meant by: In a golden scenario, invalidation of tests would be done by a language server which can intelligently determine specific tests that should be invalidated when a file or a file dependency changes. Maybe this is still handled by an event on the TestProvider, but if we take a "Test Protocol" approach then coordination will be harder.

hbenl commented 3 years ago

One more thought: the central difference between tests and suites in Test Explorer is that the state of a suite is always computed from the states of its children (with one exception: a suite can be marked as errored). TestProvider doesn't distinguish between tests and suites but it should (optionally) provide the capability to let the state of a test be computed from the states of its children. Perhaps this could be flagged by another TestRunState (e.g. INHERIT). Of course it could be left to TestProvider authors to update the parents but this can become a slightly hairy issue if you want to avoid potential performance problems, so it would be nice if VS Code provided that.

matepek commented 3 years ago

Hello I'm the author of C++ TestMate

The Test Adapter API makes the distinction between suites and tests, we do not. They have almost identical capabilities, and in at least one scenario the 'suites' are more like tests and the leaf 'tests' cannot be run individually.

Praise the initiative. The feature will be useful for Catch2 too. (related issue)

invalidation and auto-run

I think only language server isn't enough. We need the flexibility to retire tests in case of external dependencies and file changes.

A request of my user: I don't see an "output window" concept on the current API but I assume that it is unavoidable. (Some test frameworks can provide useful output which cannot be associated to any tests.) The person expressed the need to have the output window updated before the test itself finishes. I believe it is a reasonable request. It seems to me that multiple TestState with "Running" can be a solution. But I don't see how the output window fits into the current API.

About test state propagation: For my extension if a test fails all the ascendants has to be failed too. Even a node has a "Passed" state if one of the descendant's has a "Failed" or "Errored" state than has to be propagated. Some precedence also seems necessary like: "Running" over "Errored" over "Failed" over "Skipped" over "Unset".

But this raises another question. Scenario: B and C under A. Test B reports failure and test B still running. What should be the state of A? According to previous precedence it should be "Running" but users might would find it useful to see that some tests part of that "Suite"/node has already failed. I have some ideas about it but all of the sacrifices the simplicity and clearness of the API.

Question: The integrated test explorer what extra features will be available for VSCode users? I mean we have a working system now thanks to @hbenl. What is strongest the motivation of the integration?

marcellourbani commented 3 years ago

Thanks for the mention @hbenl

In my abapfs extension I only know I have a test after running it, a discover function would take weeks to run.

Proposed API looks ok as long as I can fire onDidChangeTest any time

orta commented 3 years ago

/cc @connectdotz who has been doing great work on vscode-jest /cc @captbaritone who I had a conversation exactly about this a year ago

connor4312 commented 3 years ago

Thank you for the feedback and the tags!

Point taken around the error notifications. This week is endgame for our September release, but we'll continue discussion and update this issue + the sketch next week.

TestProvider doesn't distinguish between tests and suites but it should (optionally) provide the capability to let the state of a test be computed from the states of its children

But this raises another question. Scenario: B and C under A. Test B reports failure and test B still running. What should be the state of A? According to previous precedence it should be "Running" but users might would find it useful to see that some tests part of that "Suite"/node has already failed. I have some ideas about it but all of the sacrifices the simplicity and clearness of the API.

@hbenl You bring up a good point that, in the current design, state is never inherited. Generally for tests we want the opposite -- that state is always inherited -- but thinking about this more it's only applicable within a tree UI. Perhaps if the state is unset, then we say UI implementations should do the right thing automatically. For example, a test explorer would show parents as failed if any of their children failed, or running if any of their children are running. But a status bar item that shows failing/passing tests would only count those who had a non-unset status.

I think this approach would also make @matepek's scenario nice from an API point of view; the UI can be arbitrarily smart.

One thing that I was missing, and I am not sure if it possible, is to match my tests with the source files. My adapter runs C code and I can't simply grep all files for the function names.

@bneumann I think this problem is mostly out of scope. We will provide hooks in VS Code to build tests, but won't add any special new discovery mechanisms.

I think only language server isn't enough. We need the flexibility to retire tests in case of external dependencies and file changes.

@matepek Definitely, the language server won't be the only way (we aren't going to require test providers to also integrate their language server) but we should not block that path.

I don't see an "output window" concept on the current API but I assume that it is unavoidable. (Some test frameworks can provide useful output which cannot be associated to any tests.) The person expressed the need to have the output window updated before the test itself finishes. I believe it is a reasonable request. It seems to me that multiple TestState with "Running" can be a solution. But I don't see how the output window fits into the current API.

@matepek Yes, test output and diffing is one particular area we want to improve. Thank you for mentioning the streaming scenario, I'll make sure we look into that and will tag you with the outcome/proposal.

What is strongest the motivation of the integration?

We actually are not set yet on making this integrated. Our roadmap outlines the purpose here (which I should add to the original thread for clarity):

Investigate how VS Code can improve the testing support. Several extensions are already providing testing support, explore what APIs/UIs could be added to improve these testing extensions and the test running experience.

Some benefits of being built-in, mentioned briefly in the table in the original issue, are:

Clear path to build official test extensions to give a consistent experience
Possibility for better diffing support, as matepek touched on
Coverage and live testing (under exploration)
Keeps APIs and versioning under vscode.d.ts on which other extensions can be built with compatibility checks (i.e. package.json engines field) and guarentees

At the moment we're sketching an extension host API to provider an understanding of the 'ideal world' API. This may become a built in API, it could be a protocol, or we could end up working with Holger to apply some principles the test explorer UI.

connectdotz commented 3 years ago

hi, I work with @orta in vscode-jest. This is an interesting thread, have a few questions:

First, the high-level question: I am wondering what is the main feature from the end-users' perspective? Is it a new build-in sidebar view listing all tests that can be run/debug via standard UI? For test extensions, does it boil down to using the build-in "Test Explorer" instead of their own sidebar views?
The proposed API only seems to cover the Diagnostic(the PROBLEMS tab), our extension also heavily uses Decoration for inline visual indications and assistance in the editor, as well as OUTPUT for detailed test output and notifications for error/help-tips (as others also mentioned). If a consistent interface is the goal, then maybe all these UI extensions should be included as well?
In jest, a test is identified by its name(label) and the hierarchical context (the parent block). It is hard to replace a single test because its name or line location can all change when users are developing their tests. Therefore we always end up reconstruct the test tree for the given file on each run. The proposed API seems to take the "bottom-up" approach, where each TestItem points to its parent. To find all tests by a given parent will require traversing all the items. This will be quite expensive for our use case, but if we are the only test framework prefer top-down, we could just build our own internal structure to track them... is this not an issue for other test frameworks/extensions?
Given there is no update operation for TestCollection, I assume TestItem is immutable? But the TestItem interface didn't have readonly so not sure... How does one update a test status for example? delete + add? or mutate the TestItem + firing onDidChangeTest? You are right that it is kind of redundant if you already observe the collection and yet still require the change event...
In multi-root workspaces? Does the explorer display the current folder only? Will users see the full workspace (summary of each folder)?
A stretch goal: In addition to standard UI, it would be great if this can also address our users' biggest pain point, which is often the test setup, env configuration, etc... how about something like a "test config", similar to a debug config, that Provider can provide snippet or even wizard to help set up in launch.json?

rossknudsen commented 3 years ago

hey all, I maintain the Jest Extension for the @hbenl's Test Explorer UI. Here is a bit of a brain-dump, some of which might be useful:

I agree with @connectdotz comments about the structure of Jest tests being very hierarchical and top-down. I maintain internal state in my extension as a tree and map from the internal structure to the Test UI structure when I need to emit events as the internal structure is highly correlated to the output. IMHO the test collection as a mutable list-like collection and mutable test objects would be more difficult to translate to/from than what I currently do. Now it may be that the Test UI extension can live on as an intermediary extension taking care of this translation on our behalf (amongst other tasks e.g. multiple workspaces). But I think it would be worth taking a little time to think about how the test information will be consumed by UI extensions as noted in the todo. Maybe leave this API in draft until the UI API has been fleshed out. It doesn't make sense to flatten it, pass it to VS Code to pass it to a UI extension where it is transformed back into tree form again. Other extension authors should comment on what their preferred format is.

The proposed API describes how the TestProvider can be controlled by VS Code - through the runTests and addWorkspaceTests methods. However, at least in the context of the extension I maintain, it doesn't always make sense to obey that command. In the case of the addWorkspaceTests method, the extension may have already discovered all of the tests before that command is invoked making it a no-op. Also because my extension is currently monitoring the file system for changes and running tests as it sees fit, what if the user wants to initiate a test run through the runTests method? Should the internal test run be cancelled in favour of the users choice? Is it Ok to plainly disobey the request?

Perhaps the TestProvider has a state, e.g. Parsing/Discovering, Running Tests, Idle etc, potentially with a corresponding event emitter for status changes. This could also serve to notify VS Code of events such as failure to load/run tests etc.
I think one missing feature would be the ability to open the corresponding source from a TestItem. The Test Explorer UI allows this but I can't see how you can find the source file with the proposed API.

Test Messages:

Seems a bit light-weight when compared with the Test Explorer UI API. There you have a message and a tooltip which would live at the equivalent level of the TestState. However, I don't have any suggestions on how to improve things, while remaining UI agnostic. I use the message to display detailed stack traces, I can't remember if I use the tooltip or not.
Would be awesome if we could have colored text.
Perhaps we need an equivalent of log levels for the TestMessages if it is possible to provide messages that are not errors (info, warning etc).

matepek commented 3 years ago

Also maybe a category for testing could be useful. Currently I'm using Other.

connor4312 commented 3 years ago

Hello again everyone 👋 with our housekeeping wrapped up, I'm back on this for the remainder of the year. My goal by early Jan is to have an API to parity of what is show in the sketch and what the test explorer provides working, as well as a branch or fork of the test explorer that works with the new API.

First, the high-level question: I am wondering what is the main feature from the end-users' perspective? Is it a new build-in sidebar view listing all tests that can be run/debug via standard UI? For test extensions, does it boil down to using the build-in "Test Explorer" instead of their own sidebar views? -- @connectdotz

Consistency and discoverability as mentioned originally
We think there's more and smarter things we can do around showing test results and diffing which varies in extensions and often is not ideal.
Likewise around coverage and coverage integration.
Tie-ins with LSP and debug services

The proposed API only seems to cover the Diagnostic(the PROBLEMS tab), our extension also heavily uses Decoration for inline visual indications and assistance in the editor, as well as OUTPUT for detailed test output and notifications for error/help-tips (as others also mentioned). If a consistent interface is the goal, then maybe all these UI extensions should be included as well? -- @connectdotz

The sketch has TestMessages in the TestState, which are optionally associated with a location. Locations contain a URI and a range, which allows us to show decorations

In jest, a test is identified by its name(label) and the hierarchical context (the parent block). It is hard to replace a single test because its name or line location can all change when users are developing their tests... -- @connectdotz

I agree -- @rossknudsen

The model for this was as you mention somewhat challenging. We have prior art in the extension API for lazily providing trees of hierarchical data via the TreeDataProvider<T>. In principle we could use this or something similar for tests as well. However, the TreeDataProvider is built to provide information gradually about its data, while many cases for tests (e.g. showing how many tests passed/failed) require the entire test tree to be materialized.

So, we could go for a more traditional recursive data structure, e.g type TestItem = { children: TestItem[]; ... }. One benefit of this approach is that it puts the onDidChangeTest event in less of a "weird spot" and more similar to onDidChangeTreeData; the event can fire when a TestItem changes or when its array of children updates. Then we have a testRoot element, which can be generally be hidden if there's only a single test provider, or shown in the edge case that there's multiple (e.g. if I'm developing a Go server and a client with Jest tests.) I like this approach more than the current collection-based approach, cc @sandy081

diff --git a/src/vs/vscode.proposed.d.ts b/src/vs/vscode.proposed.d.ts
index 98bada7b79..ca87f5d17b 100644
--- a/src/vs/vscode.proposed.d.ts
+++ b/src/vs/vscode.proposed.d.ts
@@ -2202,14 +2202,20 @@ declare module 'vscode' {
     * via `addWorkspaceTests`.
     */
    export interface TestProvider<T extends TestItem> {
-       readonly collection: TestCollection<T>;
+       /**
+        * Root node for tests. The `testRoot` instance must not be replaced over
+        * the lifespan of the TestProvider, since you will need to reference it
+        * in `onDidChangeTest` when a test is added or removed.
+        */
+       readonly testRoot: T;

        /**
         * An event that fires when an existing test in the collection changes.
-        * This can be a result of a state change in a test run, for instance.
+        * This can be a result of a state change in a test run or its properties
+        * update.
         *
-        * @todo VS Code already 'observes' the test collection, it's awkward
-        * to have a separate event to notify changes
+        * This should also be fired when a test gains or loses a direct child
+        * from its `children` array.
         */
        readonly onDidChangeTest: Event<T>;

@@ -2241,10 +2247,10 @@ declare module 'vscode' {
     */
    export interface TestRunOptions<T extends TestItem> {
        /**
-        * Array of specific tests to run. If not given, then the provider should
-        * run all tests.
+        * Array of specific tests to run. The {@link TestProvider.testRoot} may
+        * be provided as an indication to run all tests.
         */
-       tests?: T[];
+       tests: T[];

        /**
         * Whether or not tests in this run should be debugged.
@@ -2287,9 +2293,9 @@ declare module 'vscode' {
        location: Location;

        /**
-        * Parent item of this test.
+        * Optional list of nested tests for this item.
         */
-       parent?: TestItem;
+       children?: TestItem[];

        /**
         * Test run state. Will generally be {@link TestRunState.Unset} by

Given there is no update operation for TestCollection, I assume TestItem is immutable? But the TestItem interface didn't have readonly so not sure... How does one update a test status for example? delete + add? or mutate the TestItem + firing onDidChangeTest? You are right that it is kind of redundant if you already observe the collection and yet still require the change event... -- @connectdotz

In the sketch as well as in the above, they would be mutable and the extension should fire onDidChangeTest when they update. I like immutability, but that's not the paradigm most of our extension api takes 🙂

A stretch goal: In addition to standard UI, it would be great if this can also address our users' biggest pain point, which is often the test setup, env configuration, etc... how about something like a "test config", similar to a debug config, that Provider can provide snippet or even wizard to help set up in launch.json?

That's an interesting idea. Currently test extensions generally hook into your existing test setup. I have not thought about it from the perspective of the test extension actually being the entrypoint to testing, but as tests become built-in this can happen in the same way we've been pushing the VS Code (particularly JS) debugger to be the entrypoint to debugging...

I will add that to the todos above, but will be something for after this month.

Maybe leave this API in draft until the UI API has been fleshed out. -- @rossknudsen

Yes, all this will remain in vscode.proposed.d.ts land for some time as we get feedback from the community!

In the case of the addWorkspaceTests method, the extension may have already discovered all of the tests before that command is invoked making it a no-op. -- @rossknudsen

I think this is fine. addWorkspaceTests is a hint to the test runner that it should discover tests because there's now "something" that wants to see them. It is there to avoid test extensions having to go out and discover tests on startup, since startup time and performance is something we try to optimize for. If the tests were previously enumerated and they are already in the collection, it's fine to no-op.

Also because my extension is currently monitoring the file system for changes and running tests as it sees fit, what if the user wants to initiate a test run through the runTests method? Should the internal test run be cancelled in favour of the users choice? -- @rossknudsen

In the design thus far there's not particular requirement around serialization or lack thereof of test runs; I think this is a good thing. So it's generally up to you as the test extension. But as a user I would generally expect that the test states eventually reflect whatever my most recent run.

I think one missing feature would be the ability to open the corresponding source from a TestItem. The Test Explorer UI allows this but I can't see how you can find the source file with the proposed API. -- @rossknudsen

The location on the test item provides the file URI and source code range where the test can be found.

Perhaps the TestProvider has a state, e.g. Parsing/Discovering, Running Tests, Idle etc, potentially with a corresponding event emitter for status changes. This could also serve to notify VS Code of events such as failure to load/run tests etc. -- @rossknudsen

This is something I will be adding within the next couple days. I don't know what it would look like yet.

Test Messages -- @rossknudsen

Thanks for the feedback. More in the next couple days; this is an area for improvement/expansion.

connor4312 commented 3 years ago

Some more updates building out the API.

Test Output

These are my current thoughts. TestItems have a TestState. The TestState is defined as immutable. This lets us (and consuming UI) use instance equality checks when determining whether there was a change made to the state, instead of trying to do a deep or complex diff on every change.

As @rossknudsen suggested I added a severity, copied from Diagnostic messages. Additionally there's an expectedOutput and actualOutput which can be shown in a diff editor/peek view. Finally the message is optionally a MarkdownString.

```ts export enum TestRunState { // Initial state Unset = 0, // Test is currently running Running = 1, // Test run has passed Passed = 2, // Test run has failed (on an assertion) Failed = 3, // Test run has been skipped Skipped = 4, // Test run failed for some other reason (compilation error, timeout, etc) Errored = 5 } /** * TestState includes a test and its run state. This is included in the * {@link TestItem} and is immutable; it should be replaced in th TestItem * in order to update it. This allows consumers to quickly and easily check * for changes via object identity. */ export class TestState { /** * Current state of the test. */ readonly runState: TestRunState; /** * Optional duration of the test run, in milliseconds. */ readonly duration?: number; /** * Associated test run message. Can, for example, contain assertion * failure information if the test fails. */ readonly messages: ReadonlyArray>; /** * @param state Run state to hold in the test state * @param messages List of associated messages for the test * @param duration Length of time the test run took, if appropriate. */ constructor(runState: TestRunState, messages?: TestMessage[], duration?: number); } /** * Represents the severity of test messages. */ export enum TestMessageSeverity { Error = 0, Warning = 1, Information = 2, Hint = 3 } /** * Message associated with the test state. Can be linked to a specific * source range -- useful for assertion failures, for example. */ export interface TestMessage { /** * Human-readable message text to display. */ message: string | MarkdownString; /** * Message severity. Defaults to "Error", if not provided. */ severity?: TestMessageSeverity; /** * Expected test output. If given with `actual`, a diff view will be shown. */ expectedOutput?: string; /** * Actual test output. If given with `actual`, a diff view will be shown. */ actualOutput?: string; /** * Associated file location. */ location?: Location; } ```

Test Representation

I really still like the idea of having a test tree as @connectdotz and @rossknudsen. However I start to struggle with it when it comes to document versus workspace tests.

Having a document/in-editor view of test metadata requires that their locations be able to be reconciled with unsaved changes, which are not visible in a filesystem-based workspace watcher. (OIt's for certain that not all test extensions will go through the trouble of implementing this, but doing so should be possible.) This means we either have their weird dual state in a single tree of tests, or we have two different trees. There's also the issue that TestItems may not necessarily have associated Locations, so in a single tree it's hard to tell what tests in the hierarchy are relevant for a given TextDocument anyway.

So this is the direction I'm thinking -- the providers return a "hierarchy" object on which events are fired:

```ts /** * Tree of tests returned from the provide methods in the {@link TestProvider}. */ export interface TestHierarchy { /** * Root node for tests. The `testRoot` instance must not be replaced over * the lifespan of the TestHierarchy, since you will need to reference it * in `onDidChangeTest` when a test is added or removed. */ readonly root: T; /** * An event that fires when an existing test under the `root` changes. * This can be a result of a state change in a test run, a property update, * or an update to its children. Changes made to tests will not be visible * to {@link TestObserver} instances until this event is fired. * * This will signal a change recursively to all children of the given node. * For example, firing the event with the {@link testRoot} will refresh * all tests. */ readonly onDidChangeTest: Event; /** * An event that should be fired when all tests that are currently defined * have been discovered. The provider should continue to watch for changes * and fire `onDidChangeTest` until the hierarchy is disposed. */ readonly onDidDiscoverInitialTests: Event; /** * Dispose will be called when there are no longer observers interested * in the hierarchy. */ dispose(): void; } /** * Discovers and provides tests. It's expected that the TestProvider will * ambiently listen to {@link vscode.window.onDidChangeVisibleTextEditors} to * provide test information about the open files for use in code lenses and * other file-specific UI. * * Additionally, the UI may request it to discover tests for the workspace * via `addWorkspaceTests`. */ export interface TestProvider { /** * Requests that tests be provided for the given workspace. This will * generally be called when tests need to be enumerated for the * workspace. * * It's guaranteed that this method will not be called again while * there is a previous undisposed watcher for the given workspace folder. */ provideWorkspaceTests?(workspace: WorkspaceFolder): TestHierarchy; /** * Requests that tests be provided for the given document. This will * be called when tests need to be enumerated for a single open file, * for instance by code lens UI. */ provideDocumentTests?(document: TextDocument): TestHierarchy; /** * Starts a test run. This should cause {@link onDidChangeTest} to * fire with update test states during the run. * @todo this will eventually need to be able to return a summary report, coverage for example. */ runTests?(options: TestRunOptions, cancellationToken: CancellationToken): ProviderResult; } ```

This can be mirrored in the "test observers" which other extensions can call to receive a stream of tests:

```ts export namespace test { // ... /** * Returns an observer that retrieves tests in the given workspace folder. */ export function createWorkspaceTestObserver(workspaceFolder: WorkspaceFolder): TestObserver; /** * Returns an observer that retrieves tests in the given text document. */ export function createDocumentTestObserver(document: TextDocument): TestObserver; } export interface TestObserver { /** * List of tests returned by test provider for files in the workspace. */ readonly tests: ReadonlyArray; /** * An event that fires when an existing test in the collection changes, or * null if a top-level test was added or removed. When fired, the consumer * should check the test item and all its children for changes. */ readonly onDidChangeTest: Event; /** * An event the fires when all test providers have signalled that the tests * the observer references have been discovered. Providers may continue to * watch for changes and cause {@link onDidChangeTest} to fire as files * change, until the observer is disposed. */ readonly onDidDiscoverInitialTests: Event; /** * Dispose of the observer, allowing VS Code to eventually tell test * providers that they no longer need to update tests. */ dispose(): void; } ```

In this sketch the provider result and observer just have an event that fires when tests are discovered, not granular "Parsing/Discovering, Running Tests, Idle" states, but this may be sufficient.

connor4312 commented 3 years ago

Talking with Sandeep this morning, we agree that the "Hierarchy" approach is worth exploring. So I've built the API it as well as a simple sample extension

Implementation at: https://github.com/microsoft/vscode-extension-samples/blob/test-provider-sample/test-provider-sample/src/testProvider.ts

Notes:

Overall I like the hooks/hierarchy structure. Being able to implement test item lets me gracefully have well-typed files, heading, and test cases
Running a test is kind of awkward because it comes in at the 'top level' and I need to map that back to the test. Not sure there's a better way to do this. Individual tests can't just have run/debug methods, otherwise it's not possible to execute a selection of arbitrary tests as a single run. Maybe this will feel better in real test extensions which tend to be more stateful.

connor4312 commented 3 years ago

As we go into the holidays, don't expect too many updates. Recent discussion:

We like the provider API proposed. One small change is that I have added a "Queued" test state, which can be used to indicate tests which will be executed in a test run, but are not yet running. vscode.proposed.d.ts is the place to track further api changes https://github.com/microsoft/vscode/blob/a9bf16e00192e5da6b1067667a0b55a4593bcc5f/src/vs/vscode.proposed.d.ts#L1987
We'll probably hold off having the observer APIs in the initial testing release. We don't have any APIs like them exposed to the extension host, so we'll want to do some more thinking there.
We decided to implement the test explorer UI in core. There's a good likelihood we'll want to create tailored UI beyond what the extensionhost API provides. I have added a bearbones explorer UI in https://github.com/microsoft/vscode/commit/a9bf16e00192e5da6b1067667a0b55a4593bcc5f, which will appear if any test provider is registered. Still lots of features and polish needed.

jdneo commented 3 years ago

👋 Hi, I have some questions about the proposal:

About the method addWorkspaceTests(), is it able to lazy-load the tests in the explorer? (like add one level of the direct children when the user expand a node in the test explorer, this is especially useful when in a large code base having thousands of tests there)
like @connectdotz mentioned about the test config, will the API consider to generate and persist it somewhere? (for example, if the user wants to re-run the tests, is it the extension's responsible to store the launch configuration or the API will have the capability to handle that?)

Thanks

connor4312 commented 3 years ago

Thanks for the questions 🙂

About the method addWorkspaceTests(), is it able to lazy-load the tests in the explorer? (like add one level of the direct children when the user expand a node in the test explorer

Not at the moment. Such lazy loading would work for the explorer, but the moment we want to do something like show the percentage of passing or failed tests, we would need to enumerate all tests. So I'm not convinced that's something to design around.

the test config, will the API consider to generate and persist it somewhere? (for example, if the user wants to re-run the tests, is it the extension's responsible to store the launch configuration or the API will have the capability to handle that?)

I really want to avoid having a special VS Code test config. Most test runners have existing configuration, e.g. .mocharc files, config in setup.py, etc.

I'm not sure specifically if you mean just "rerun the tests I just ran" or "rerun failed tests". In both cases the state for this shouldn't be stored in a config file; this would be noise that you wouldn't want to check in to source control.

For rerun the last set of tests: I think the UI, not the test provider, should deal with this. In the current design design we use object instance identity for identifying tests, but for the purposes of serializing and restoring tests we can use the label "path" to the tests, which should be reasonably accurate.
The challenge for rerunning failed tests is more around persisting test state between sessions. Depending on the test system, there might be a test output file that could be used. Or it could be stored in workspace storage metadata. Regardless, in this design storing persisting test state is something the provider would need to do, since the provider and only provider controls the runState of tests instances.

matepek commented 3 years ago

TBH I haven’t read the threat thoroughly just wanted to mention one use case which would be nice to be covered by the new API: Catch2 C++ test framework support BDD style. And the implementation comes with limitation: One can define “sub-test-case” in a test-case but that sub-test-case become available/listed only when the test-case is running/were run. This means that a leaf/test-case has to be able to became a branch/suite and sub-test-cases has to be added at runtime or shortly after the run with the corresponding states.

A couple of months ago the proposed API design might were able to support this. After that I lost tracking. Thanks for considering the use case.

Other thing we have to deal with is the activation event. For user-user perspective they usually prefer lazy init/listing but some actually the opposite. For them now they can init the extension with an “onOpen”-task.

Bests,

Thank you for your consideration 🙏

hbenl commented 3 years ago

I've created an experimental extension that translates from the TestAdapter API to the TestProvider API, in other words: when this extension is installed, all TestAdapters will also show up as TestProviders. You can try it out here.

I have a few comments regarding the TestProvider API:

currently the API lacks a way for the user to restart the test discovery ("reloading" the tests). Ideally this should not be necessary as the TestProvider should do this automatically when any relevant file or setting changes, but in practice I think it's necessary to allow the user to trigger this manually because not all TestProviders will implement this.
the TestAdapter API supports tooltips in 2 places: when hovering over an item in the Test Explorer (TestInfo.tooltip) and when hovering over an error decoration (TestDecoration.hover). In both cases a one-line message can be set (TestInfo.description and TestDecoration.message) with more details provided in the tooltip. The TestProvider API currently has only TestItem.description and TestMessage.message and it's not clear how they will be used in the UI.
it should be possible for TestProvider.createWorkspaceTestHierarchy() to return undefined, or to have an empty TestHierarchy (currently it always contains at least one TestItem).

If / When the Test Explorer implemented in core supports all the features that my Test Explorer UI extension supports, my extension could be replaced by this converter extension.

connor4312 commented 3 years ago

That's awesome, thank you!

That's a good point. The more common case might be, even in a well behaved test provider, a finicky file watcher -- which is well known to happen. I'll add a refresh icon, like we have in the file explorer.
Thanks, will expand on that. Today I got to the point of in-editor integration (code lens and diagnostic info). I'll be working on a richer experience starting tomorrow.
Added today 🙂

hbenl commented 3 years ago

One more thought about manually restarting the discovery: we'll also need to give visual feedback to the user that the test discovery is running - currently this is only possible for the initial discovery. Maybe something like TestHierarchy.onTestDiscovery: vscode.Event<'started' | 'finished'> instead of TestHierarchy.onDidDiscoverInitialTests: vscode.Event<void>.

kondratyev-nv commented 3 years ago

Hey! Thanks for the effort in integrating this into the VSCode. I think @hbenl UI extension is amazing and it definitely a great starting point for building a common API. Overall, the current state of the proposed API looks good to me. Couple of things I'd like to ask/mention.

TestRunOptions contains the collection of tests: T[]. This is similar to the Test Explorer UI. However, I was always wondering, does some real-world scenario exist for the case of multiple tests to be run (not a single test/suite)?
I'm not really familiar with TextDocument interface (which used for createDocumentTestHierarchy for example), but does it always have a valid URI/fileName? I'm asking because it kind of does not make sense to discover tests that are not saved, or saved outside of current workspace...
Regarding file watchers and refresh/restart test discovery. It seems that every test adapter will need to do pretty much the same - watch for all test files and trigger onDidChangeTest on file save. It's not a huge or complex part, but maybe it is worth considering having that in the VSCode testing. This is also connected to autorun or save-before-run features if they will be considered in the future.
One of the important things from Test Explorer UI is id property on tests. I don't see such property in TestItem of the proposed API. Should each test adapter extend this interface with its own specific properties?

connor4312 commented 3 years ago

I was always wondering, does some real-world scenario exist for the case of multiple tests to be run (not a single test/suite)?

Actions like "rerun failed tests" will request subsets of tests to be run. Actually today I added the option to group by status in the explorer; clicking "Arrays" here would rerun only the two failed tests and not the other passed tests.

I'm not really familiar with TextDocument interface (which used for createDocumentTestHierarchy for example), but does it always have a valid URI/fileName? I'm asking because it kind of does not make sense to discover tests that are not saved, or saved outside of current workspace...

It does have a URI, textDocument.uri. However, a purpose here is in fact to include untitled/unsaved changes. For example, Go tests are structured and very easy to identify by scanning the source code of a single file, so they can implement createDocumentTestHierarchy which will allow vscode to show code lenses and structure even for unsaved changes.

If this is something a test extension doesn't want to deal with, you can now return undefined from createDocumentTestHierarchy if you see one of these files 🙂 (or not implement it at all)

Also, I will (though this is not yet done) have the action of running tests save all file changes, like we do when debug sessions are started.

Regarding file watchers and refresh/restart test discovery. It seems that every test adapter will need to do pretty much the same - watch for all test files and trigger onDidChangeTest on file save. It's not a huge or complex part, but maybe it is worth considering having that in the VSCode testing. This is also connected to autorun or save-before-run features if they will be considered in the future.

The perspective that the current API takes is that the test provider is the sole controller of the state and information in tests. I think this is a convenient property that makes reasoning easier, but it means that there's not a way to have 'out of bounds' signalling to invalidate tests. It'd be easy to have something that just sets the "state" back to Unset on tests if their file change, but I'm not convinced that something that's beneficial to do. (and the file watcher api is not hard to implement)

With regard to auto run: that will come soon :) I think that will just be implemented by having VS Code issue runTests whenever a test or tests is created in or enters the Unset state.

One of the important things from Test Explorer UI is id property on tests. I don't see such property in TestItem of the proposed API. Should each test adapter extend this interface with its own specific properties?

Yea, I expect extensions to extend the test items with additional data and info. E.g. in the sample test provider I have separate classes for the root, file, heading, and test cases which implement the TestProvider.

Internally in the API we use object identity to differentiate and identify tests.

webczat commented 3 years ago

just make sure to consider accessibility of the test explorer, now or in the future before the feature leaves preview. None of the current solutions is accessible enough, for example as a screenreader user I cannot see which tests failed in the test tree.

kondratyev-nv commented 3 years ago

Actions like "rerun failed tests" will request subsets of tests to be run. Actually today I added the option to group by status in the explorer; clicking "Arrays" here would rerun only the two failed tests and not the other passed tests.

Got it, that's a good example, thank you!

For example, Go tests are structured and very easy to identify by scanning the source code of a single file, so they can implement createDocumentTestHierarchy which will allow vscode to show code lenses and structure even for unsaved changes.

Yeah, I think in general tests can be discovered for other languages as well. But I still think it won't work properly in most cases (as it will require module imports, and it will depend on the location of the test file). Also, from what I saw in other test adapters most use external tools (test runners) to discover and run tests, which will require a file to be present in the filesystem. So I can only imagine something like doc-tests or simple tests to build small reproducible examples.

Don't get me wrong, I'm not opposing having a createDocumentTestHierarchy method as test adapter may not implement it. Just curious whether this is even a valuable feature that will have any real-world use cases.

With regard to auto run: that will come soon :) I think that will just be implemented by having VS Code issue runTests whenever a test or tests is created in or enters the Unset state.

How it'll interact with save-before-run then? I mean as one part (save-before-run) being on the VSCode side and the other (run on save) being on the adapter side, won't it trigger multiple concurrent test discoveries/executions?

Thanks for the answers! Maybe I missed it but is there a way to enable the current version of the UI and API to try integrating with it?

hbenl commented 3 years ago

I was always wondering, does some real-world scenario exist for the case of multiple tests to be run (not a single test/suite)?

You can select multiple tests in the Test Explorer UI (using Ctrl-Click or Shift-Click) and then run them, I use that quite often. Some background: I have 2 projects where running all tests can take several minutes, so I often want to (re)run only some of them. Actually one of these projects (the Firefox debug adapter) was my original motivation for writing the Test Explorer UI: when running all tests is not an ideal option anymore you need a UI to be able to quickly select the tests that you want to run. There's another Test Explorer feature that was also motivated by slow tests: the "retired" test states (shown with faint icons in the UI) - when rerunning all tests takes a long time, you really want to be able to see the state that a test got on its last test run, even if that state is outdated. But you also want to see that the state is outdated - hence the "retired" states. @connor4312 I hope you will implement this in the new Test Explorer UI, it doesn't need to be reflected in the TestProvider API (it wasn't reflected in the TestAdapter API either), since this is just part of the UI state.

It seems that every test adapter will need to do pretty much the same - watch for all test files and trigger onDidChangeTest on file save. It's not a huge or complex part, but maybe it is worth considering having that in the VSCode testing.

I agree with @connor4312 that this should be left to the TestProvider because very often there are some other files to be watched for changes (like configuration files for the testing framework) as well as the extension's VSCode settings and only the TestProvider can know the full set of things to be watched for changes.

About createDocumentTestHierarchy: I think it depends on the testing framework whether that makes sense. For many (probably most) testing frameworks only createWorkspaceTestHierarchy makes sense, but I can imagine that there are some frameworks where createDocumentTestHierarchy can be very useful.

Maybe I missed it but is there a way to enable the current version of the UI and API to try integrating with it?

It's enabled in current insider builds, to interact with the API you just need to enable proposed APIs. My experimental converter extension may be a good example.

jdneo commented 3 years ago

@connor4312 Thank you for your reply

but the moment we want to do something like show the percentage of passing or failed tests, we would need to enumerate all tests. So I'm not convinced that's something to design around.

For some test framework like JUnit 5 in Java, it's using annotation to mark a method is a test method, and it supports meta-annotation. In such case, in order to accurately find out all the tests, we need to parse each Java files to AST and do some comparison with the annotation binding for each method. This is a time-consuming operation if the project has a large code base (parsing a large number of java files to AST). Thus:

it takes a long time before onDidDiscoverInitialTests event fired
The Java Language Server will at least use one thread searching the tests after activation, which may have some impact on the activation perf.

This is what I'm concerning right now.

connor4312 commented 3 years ago

Thanks for the feedback. Going more into scenarios, though, I do not think it's practical to have levels be async. For example:

We want to have failed tests be navigable via a diagnostic-like pattern, e.g. 'hit F8 to go to the next failed tests'. This requires us to know about all failed tests, as minimum.
When navigating through code, we would like to reveal the test in the test tree, like we do in the file explorer today. We would require some getTestAtPosition(p: Position method that would let us expand the tree to that node.
There's a filter box in the test tree. Test implementers would need some provideTestsMatching(query: string) method

...and other scenarios we decide to add in the future may require additionally methods, which becomes increasingly burdensome to implement. Particularly, implementing provideTestsMatching would almost certainly require workspace tests to be discovered anyway, and if other methods are also called the extension would need to handle deduplicating the tests and their watchers.

Update on implementation. I continue to iterate on the selfhost test provider which I and several team members are using for VS Code.

The test explorer is enhanced with a text filter. It can also display tests using a list or a tree, grouping by status or displaying a single tree by filename

Building the tree in vscode where we have many thousands of tests is somewhat laggy right now. It (and other vscode tree views) will be faster once https://github.com/microsoft/vscode/pull/114237 is merged.

Clicking on a failed test will show a peek view if there's a message with expectedOutput and actualOutput present:

On the right hand side, I'll add a tree view similar to the references peek which will show all failed tests.

I'm still mulling over how to reconcile test failures and diagnostics (those error squiggles in your editor). Diagnostics work well and have keybindings people are used to, but they aren't dismissable and I've received feedback from people wanting to separate their code errors from their unit test errors. Maybe this will be done just by adding new capabilities to diagnostics. (cc @sandy081)

Finally, test can currently only be run by the test runner who contributed them. I received some feedback that there may be a desire for test runners to be contributed independently of test providers. Feedback: please let me know if you have thoughts on this or, better, scenarios where you think this would be useful.

Discussion feedback from Sandeep: - [x] Include Collapse All action - [x] Show progress while tests are running (view and activity bar) - [x] Actions on viewlet title -> I expect they run/debug all tests. It was not expected that they run only selected tests - [x] Actions on viewlet title -> order of run, debug actions are not consistent - [x] I would have refresh action as primary and it will align with other views - [ ] default sort order is by name which means the list of tests does not match the order of tests in my file - I would expect the order same as it is in the file - [ ] diagnostics is probably not right place for test failures, but we can make the diagnostic keybindings in vscode also navigate test failures - [ ] test failures should be a decoration, e.g. in the gutter, some other action to open diff - [x] use existing workbench.view.extension.test container - [ ] a "show in active file" similar to what the problems view has Feedback from Isi: - [ ] Clicking on the test in viewlet focus the parent (the file). It should focus the test (where I clicked) - [ ] You do not distringuish between single vs double click - [ ] accessibility feedback #114653 - [x] debug and run icon one next to the other do not seem to use the same triangle. The debug one is using a larger one, so does not look very nice - [x] hover over item you get "hover title" - [x] there should be some feel good message to the user when all tests pass. Could we update the title from TESTING to TESTING 123/123 :heavy_check_mark:, first being the number of passed tests, the other total tests - [x] when a test fails consider putting a decoration number to the activity bar icon. Consider putting failing test to top (does not happen right now I just just re-run one test and it fails)

hbenl commented 3 years ago

I have updated my converter extension to return undefined from createWorkspaceTestHierarchy() instead of a dummy TestProvider - thanks for the API update! Then I realized that we still need the ability to have an empty TestHierarchy. Imagine the following scenario:

the user installs a TestProvider extension and opens a workspace folder without tests
VS Code calls TestProvider.createWorkspaceTestHierarchy(). Since there are no tests in the workspace folder, the TestProvider returns undefined.
The user starts adding tests.

Now there is no way for the TestProvider to send these tests to VS Code. Of course the TestProvider could have returned a TestHierarchy in step 2, but then you'd always see an empty TestHierarchy from that TestProvider (if you have several TestProvider extensions installed, this can quickly become annoying).

OmarTawfik commented 3 years ago

Thank you so much for the prototype and the discussion above! Such feature is sorely needed, and I think this is the right approach to tackle it at an API level. I work on test runners for codebases with millions of files, and there are a few concerns due to size that I wanted to highlight.

[P0] Importance of lazy loading tests

For repos where users often open a workspace containing tens of thousands of nested subfolders, it is often impossible to enumerate all tests at once. Enabling lazy loading will allow test adapters to partially scan only parts of the repo that are already active in the editor, while analyzing additional folders only when needed/expanded.

For a repository with thousands of projects, an adapter will take forever (additionally wasting valuable CPU cycles) computing results for projects that will never be used, but it can provide a list of top-level projects (test parent nodes) in a couple of seconds by simply doing a shallow disk scan. This goes hand in hand with the VS Code philosophy of always optimizing for the UI responsiveness.

This does not mean dropping support for eager loading. This will definitely save many round trips for cases where the results are already available at the adapter side.

[P0] Running multiple adapters together

Will the API ever support customizations to the test run, like specifying how many tests to run in parallel? In a workspace with multiple test adapters active (node, python, etc...), how will a “Run All”/“Rerun All” action work? would it invoke all of them at once? how would they manage to play nice and not attempt monopoly on precious CPU resources?

[P1] Hierarchy of the test tree

I wonder if we already thought about the hierarchy in case of deeply nested tests. Do we want to support custom icons (or perhaps another visual indicator) for parent-level nodes? Visually interpreting the tree is much easier when you can visually see a clear separation of (projects > modules > files > unit tests).

Without this, deeply nested trees would look like an infinite sea of green checkmarks and red crosses, and quickly looking up what exactly failed (and identifying a common root cause) in a tree with various collapsing/expanding children becomes much harder.

[P1] Code actions on failure

Can we provide custom actions on tree items (failures) and their inline failure viewer on the code editor itself? Is there a better supported way? this would be needed in cases like:

Automatically applying a fix for the failure, like updating jest screenshots.
Filing bug reports if failure is unexpected.
Disabling a flaky test.

[P2] Outdated metadata

When tests run and report results, they return metadata (line number, specific assertion strings) that are extracted from the document when the test started. If the user edits the file before test is done (a common practice to run existing tests while authoring new ones), the metadata would be invalid, or can be displayed in inaccurate positions. So the editor would need to remember which version of the document the test was running at, or at least indicate that the results are outdated.

[P2] Hiding unwanted tests

The search UI has a great feature to click a little button (X) to hide a search result. I suggest thinking about supporting this here as well. Users can quickly exclude an irrelevant/slow/flaky test from their local set as they are editing code. Like search, that preference will be reset when the tree updates.

jdneo commented 3 years ago

[P0] Importance of lazy loading tests

@OmarTawfik Echo with this! 😀

connor4312 commented 3 years ago

Thanks for the feedback, Omar. I'll continue to think on lazy loading trees and review your suggestions more thoroughly next week.

Running multiple adapters together

At the moment it will run all of them in parallel. A user/workspace setting would be a nice way to configure this.

Hierarchy of the test tree

This is a good idea. We could make a reasonably easy to use by allowing an extra icon property on test items, which if present would be shown alongside the status (with the status icon sort of like how I show a breakpoint icon here). Then you don't have to worry about providing 6x icons for all the states.

Code actions on failure

Definitely a good extension point. Currently in the proposed API, the notion of "test results" are not exposed, but I just added them in core this week so that is mostly wiring.

Related: dealing with results is pushing me towards having (probably optional) test IDs. I wanted to avoid these and just use object identity, and this works for most cases, but consider that you have a serialized result from a test run and no active test observers. If you go to a document and we ask for tests in that file, today we have no way to know how to correlate the test results with reported tests in the file.

Default behavior could be forming the ID from the 'path' of labels to each item. This should be a good default for most runners, and seems to be what a lot of you do today with the test explorer ui.

Hiding unwanted tests

Good idea. Would also require test IDs in order to be persistent 🙂

Friday eye candy: moving to decorations for test results. I will probably have glyph margin icons on instead of code lenses by default, with an option to toggle each on/off. I will also have an extra property on test messages to allow a separate message to be shown in the inline decoration, instead of its full (possibly detailed, multiline) message.

Note that the icons will be slightly prettier with https://github.com/microsoft/vscode-codicons/issues/34

matepek commented 3 years ago

Redundant events

I have a question related to TestHierarchy.onDidChangeTest:

        /**
         * An event that fires when an existing test under the `root` changes.
         * This can be a result of a state change in a test run, a property update,
         * or an update to its children. Changes made to tests will not be visible
         * to {@link TestObserver} instances until this event is fired.
         *
         * This will signal a change recursively to all children of the given node.
         * For example, firing the event with the {@link testRoot} will refresh
         * all tests.
         */
        readonly onDidChangeTest: Event<T>;

As far as I understand I should fire this every time when I change some value/state-value on the test, right? So for example let's say I have the following hierarchy:

- root
  - s1
    - s1t1
    - s1t2
  - s2 
    - s2t1
    - s2t2

User asks to run the s1t1 and s1t2.

Now I have to do the following:

1) onDidChangeTest to modify s1t1 to running 2) onDidChangeTest to modify s1t2 to running 3) onDidChangeTest to modify s1 to running
4) onDidChangeTest to modify s1t1 to passed 5) onDidChangeTest to modify s1t2 to passed 6) onDidChangeTest to modify s1 to passed

While 1 and 2 seems redundant due to the recursive update of 3, I have to be aware of the structure to make that decision which is pricy.

After 5 I have to update the s1 again thats why I call 6 but 6 updates s1t1 and s1t2 again which seems redundant.

From running perspective the update mechanism sounds more useful if we do update ancestors instead of children. If I'm not mistaken the problem here is the children. The observer cannot know that a child is removed or added or modified. And because of this you need this contract:

 This will signal a change recursively to all children of the given node.

Because the UI or other subscriber has to be notified and updated. This can cause a lot of extra unnecessary computation.

Unless we send some extra information with attached to the event. Let's consider the following API:

interface TestItemChange<T extends TestItem> {
  item: T;
  parentRecursively?: boolean;
  childrenRecursively?: boolean;
}
...
onDidChangeTest: Event<TestItemChange<T>>

Pros:

We don't have to call the onDidChangeTest on ancestors (vscode will).
Common logic would be moved to vscode, don't have to be implemented.
Saves computation (this actually can be an issue, some user of mine were already complaining about that they have issue because they have thousands of tests).

Cons:

more complex API

For finer control a more detailed version can be considered:

interface TestItemChange<T extends TestItem> {
  item: T;
  parentRecursively?: boolean;
  childrenAddedOrRemoved?: boolean; // This does not imply that one of the existing child has changed.
  childrenModified?: boolean; // This implies that at least 1 existing child has (may) changed.
}

It's not just interesting from extension developer side but maybe from the vscode side too. I'm not familiar with the vscode's code but if we can save thousands of node updates with this that could make a difference.

Edit

        /**
         * An event that fires when an existing test in the collection changes, or
         * null if a top-level test was added or removed. When fired, the consumer
         * should check the test item and all its children for changes.
         */
        readonly onDidChangeTest: Event<TestChangeEvent>;

TestChangeEvent.added and TestChangeEvent.removed can be easily computed so the "more detailed version" might be useless. Though it worth to mention that this logic don't care about the order of the elements of the children, right? The updated is a bit unclear for me. How will vscode tell that which test's were updated? If I send and event about s1 will all it's descendant part of it or vscode will do a compare and only the actually changed ones will be reported? Sound also computation-heavy. (I guess no.)

Okay, I'm getting confused. Is this TestChangeEvent will be send on the aggregation of multiple TestHierarchy.onDidChangeTest? If not, I don't see the current point of the TestChangeEvent.updated being an array. In that case I don't see the point of TestObserver.onDidChangeTest: When fired, the consumer should check the test item and all its children for changes..

One thing for sure: Without TestProvider developer providing the childrenRecursively?: boolean; information a lot of extra computation/event will happen.

Elapsed runtime

I can imagine a TestState.startedAt?: Date property. (It might can be calculated at construction time). With that the UI can show that for how long the test is in that state. It's mostly interesting in case running tests I assume.

matepek commented 3 years ago

[P0] Importance of lazy loading tests

Just an idea:

 children?:  Thenable<TestItem[]>

I would have

private  _children: undefined | TestItem[];
public get children() { 
  if(!this._children) this.initChildren();
  return this._children!;
}
private initChildren(): Promise<TestItem[]> { this._children = ... }

Okay this is not good enough, if it's a leaf we wanna know without triggering the init. :S hasChildren?:boolean.?

Also TestRunState.Loading = 7.

matepek commented 3 years ago

About streaming output:

I can see that with multiple onDidChangeTest(sametest) running events (TestRunState.Running) with more and more TestMessage it is possible to imitate the streaming experience. But.. updating the state: TestState; of the TestItem will be costly because it is not an interface but a class. What is the reason TestState being a class instead of an interface? (Okay I have an idea, like easily checking if it changed or not.)

Second thought: UI has to copy the content anyway but in that case at least the logic is there and not in my code so I can do a push on TestState.messages. (after readonly-ness has been removed)

Question:

TestMessage.message can be a multiline string right? I'm not convinced that this will serve the purpose well.

For example a typical GoogleTest output:

[ RUN      ] TestCas3.test2
some logline
other log line
cpp/gtest/gtest2.cpp:29: Failure
Value of: 1 == 2
  Actual: false
Expected: true
more loglines
[  FAILED  ] TestCas3.test2 (4007 ms)

It might happen that I wanna send 3 events: first:

new TestState(TestRunState.Running, [
        {message: "some logline\nother log line"}
    ])

then:

new TestState(TestRunState.Running, [
        {message: "some logline\nother log line"}, 
        {
            message: "Value of: 1 == 2\nActual: false\nExpected: true",
            severity: TestMessageSeverity.Error,
            expectedOutput: "true",
            actualOutput: "false",
            location: ...
            }
    ])

and at last when it is finished:

new TestState(TestRunState.Failed, [
        {message: "some logline\nother log line"}, 
        {
            message: "Value of: 1 == 2\nActual: false\nExpected: true",
            severity: TestMessageSeverity.Error,
            expectedOutput: "true",
            actualOutput: "false",
            location: ...
            },
        {message: "more loglines"}, 
    ])

Maybe this 3 event will fire in only 3 seconds. Note: Back to my previous comment: if there is a child the current API could imply that is might updated too which it probably wouldn't be the case. Performance again..

Edit:

I've realized that I can edit the array:

const testMessage = [];
const ts = new TestState(TestRunState.Failed,testMessage);
testMessage.push({message: "some logline\nother log line"});

works just fine. I still don't get the point that class not being an interface and having readonly. I thought maybe from the side of the vscode developer to prevent the modification of the user's structure but for that an interface is just fine.

connor4312 commented 3 years ago

Now I have to do the following:

I do not recommend changing the state of s1. I believe I mentioned this somewhere up in the thread, but I realized that the state of these suites is aesthetic in the explorer and can be computed based off their child state instead of having every extension implement 'inheritance'. So this is done automatically. You only need to update s1t1/2 when their corresponding states change.

This will signal a change recursively to all children of the given node.

I worded this badly in the documents. This only checks for added or removed children -- which we do quickly and efficiently by keeping a Set of the last seen children. We don't check existing children for updates or recurse further down (since adding and removing nodes will result in entirely new or entirely removed subtrees)

Child modifications require that those tests be individually fired in the event emitter.

What is the reason TestState being a class instead of an interface? (Okay I have an idea, like easily checking if it changed or not.)

Indeed :)

We could make that an interface, but then diffing logic becomes more complex and costly. Right now we can keep a shallow-clone of the last-seen object to compare against when a change comes.

TestMessage.message can be a multiline string right?

Yes. I'm not sure what what you've shown the purpose that it doesn't fulfill.

I've realized that I can edit the array... works just fine

This is surprising, I might have some diffing logic that's picking up changes incorrectly. It's not intentional that this works.

matepek commented 3 years ago

the state of these suites is aesthetic in the explorer and can be computed based off their child state

Ah, okay, nice.

Since then I had the chance to do some migration and my extension has a prototype version working with the proposed-api. I've noticed a couple of things.

State inheritance (the first sentence of this comment) is not perfect: If there is a child with TestRunState.Skipped then the parent will inherit that. The only scenario when I would like a parent to be in that state when all its children are skipped.

matepek commented 3 years ago

I've realized that I can edit the array... works just fine

This is surprising, I might have some diffing logic that's picking up changes incorrectly. It's not intentional that this works.

I take that back.

Florin-Popescu commented 3 years ago

Hello!

I developed the Unity Framework for C Test Explorer and just wanted to give my opinion here.

When I say diagnostic I mean the "Diagnostic" in the VS Code API that appears in the problems view (and can also have an error squiggle in the editor)

The downside there is that if they have a syntax error, having the problem be duplicated by both the language server and test extension will be some ugly clutter.

How should errors loading tests be handled? Emit diagnostics or have some test-specific code?

I'm still mulling over how to reconcile test failures and diagnostics (those error squiggles in your editor). Diagnostics work well and have keybindings people are used to, but they aren't dismissable and I've received feedback from people wanting to separate their code errors from their unit test errors. Maybe this will be done just by adding new capabilities to diagnostics. (cc @sandy081)

I personally use the Problems view a lot while coding and it would be very nice to have the test results show up there.

I would like to propose that TestMessageSeverity (and I also guess DiagnosticSeverity) should be extended with a value of type Test to have a distinct view for test failures due to actual assertions. Tests failing to build/run, syntax errors etc. would still show up as Errors, Warnings etc.

It could look something like this, with squiggles of a different type (and color) in the Editor to differentiate from other Diagnostic types: 68747470733a2f2f6d656d65732e706565742e696f2f696d672f32302d30392d35363434363665312d666135302d346465652d383838612d6136333134393862323037352e706e67

This way test extensions would provide failed results in a distinct manner to the Problems view and the Editor. It would be easier to distinguish failed tests from actual compiler warnings, syntax errors etc since (to me at least) these have a different severity.

Also it could allow for functionality like if test_foo_1() (test code) fails, extensions could have it trace back to the function foo() (code under test) and point with an URI to that as well. Something similar to the errors in the screenshot which mention 2 URIs: > test_foo_1() failed. Expected and actual... -> testSource.c [x, y] - URI of test_foo_1() Function under test foo() -> productionSource.c [x, y] - URI of foo() This would be optional for extensions which want to implement this, since it probably is expensive depending on language or test framework.

Also with regards to Diagnostics clutter (duplicated by both language server and test extension), I think that the extension could simply choose not to report syntax/compilation errors at all, since the language server already does that. Or it could parse existing Diagnostics via vscode.languages.getDiagnostics() and not duplicate those which are already present from the language server, but report other errors. But I think this choice should rather belong to the test extension, cause it also depends on language or test framework. For instance in one of my projects there are different compiler versions used for unit tests and production code, so some errors will be identical, some will only be specific to the unit tests.

Anyways thank you so much for this initiative! Looking forward to seeing where it goes.

matepek commented 3 years ago

About retired states.

This basically a property which is perpendicular to the TestRunState. If the binary which contains the tests has changed then all the test results are retired/old/expired/invalid.

Me personally really liked this feature of the "Test Explorer" by @hbenl. It's almost like setting the TestRunState to Unset back but still indicating what was the previous result.

Please consider the support of it 🙏

matepek commented 3 years ago

Just a summary of my wish/suggestion-list. (Hit the corresponding emojis if you have the same ^^)

retired events ❤️
new state: TestRunState.Loading 👀
lazy load of suites 🚀
output streaming / output window 🎉

connor4312 commented 3 years ago

Thanks! This thread is getting long and winding so I will open discussions on separate issues (linked).

API updates that people may be interested in in https://github.com/microsoft/vscode/issues/115101

OmarTawfik commented 3 years ago

@connor4312 can you please tell me if this has a timeline? when do we think this can be available in Stable?

firelizzard18 commented 3 years ago

One scenario that isn't covered is running a subset of tests as discovery is happening (e.g. if I wanted to run all tests with the word "foo" in them) without starting a new test run. This would take some finessing and will not be supported by all adapters, for instance Go could never support it without changes to its toolchain, so is probably not worth designing for...

@connor4312 Can you clarify your statement? I'm not sure how you would run a subset of tests without starting a new test run in any situation.

I am the developer of the Go Test Explorer. My extension discovers tests by executing the symbol provider for all *_test.go files and marking func Test* declarations as tests. It is not challenging to filter out tests containing "foo". On a lower level, tests containing "foo" can be executed by running go test -run 'foo'.

connor4312 commented 3 years ago

I'm interested in community input on this issue! https://github.com/microsoft/vscode/issues/115102#issuecomment-778490513

can you please tell me if this has a timeline? when do we think this can be available in Stable?

We generally do not have set timelines for VS Code features, but I would estimate sometime this spring.

Can you clarify your statement? I'm not sure how you would run a subset of tests without starting a new test run in any situation

Say for example the user asked to run all tests with "foo" in the name. In the UI, this would be done by filtering tests in the explorer view and then hitting "run" for them. This causes VS Code to call runTestswith every foo-containing test we know about at that instant -- if discovery is still ongoing, those will not be included in that initial call. I was thinking whether there would be some streaming way that VS Code could, as tests were being discovered, say "also run this test in your current run". Go, at least in the interface that the go run CLI provides, would not support that append mechanism.

This is lots of extra machinery and API for an edge case that would be painful or impossible for many test runners to support, so I don't want to do it.

firelizzard18 commented 3 years ago

I suggest either: A) partially or completely disable running tests while discovery is happening; B) retain the previous test set until discovery is complete and if the user triggers a filtered test run before discovery is complete, evaluate the filter against that previous test set; or C) allow the user to configure this behavior.

Down the line, an optional capability could be added, allowing test adapters to handle the filtering themselves. For Go, I could run go test -json -run '<filter>' ./..., parse the result, and notify observers of any tests discovered in the process.