Closed orta closed 4 years ago
I really like the idea of a workbench.
I think it's sufficient for the markup language to be able to show the setup behavior without prescribing the expected output; the author can describe in prose what they don't like about it. We've found that emitShouldContain
/etc style tests are usually too sensitive to "spelling" effects (e.g. Array<T>
vs T[]
).
For something like a refactoring, showing the refactorings available (perhaps filtered by a search string?) and their post-render output would be sufficient.
Error message numbers only have "best effort" stability and I'd generally avoid trying to hardcode them. This is also subject to oversensitivity; when we add a new "Here's what we think is going on" error message on top of an existing error message, the observed top-level error code changes. Error positions are also subject to some movement; this is why I generally lean toward "Let people demonstrate the behavior" rather than "Let people prescribe the behavior". Showing version-to-version changes in the demonstrated behavior is also likely to be more instructive than pass/fail.
What I had in mind is something that is more focused on how to implement this, and specifically independent of the style of tests -- anything goes, as long as it has support in the TS repo.
With this in mind (and therefore nothing about the front end that you'd actually use), the functionality that would be implemented is best left for just the tests that are in the repo -- so in the below I'm talking about fourslash and baseline tests, but if the twoslash thing is added, then that would be another option. (IOW, I see this as independent of the style of tests.)
Take the whole repo, build it, and dump the result in a docker image (almost as is, removing unnecessary stuff, and possibly with some interface script described below).
There would be a classification of test files by their "style" -- baseline or fourslash, and some files would not be classified (tests that are more involved than a one-file thing (like the docker tests), or things that are not test cases by themselves).
The resulting image has two purposes:
gulp runtests --t="someTest"
When someone interacts with this:
docker run
with that one file mounted in the right place, and with an filename and an argument that makes only that one file run and the results rendered butAt this point, the test case could be attached to a PR with some additional human explanation. That explanation could be empty (a test just failed, and it shouldn't), or it could describe what's wrong with the output in the case of a baseline test (shouldn't get errors, something in the types file should be different etc). The bottom line is that after this there is a test case ready to add.
When the attachement to a PR is done, also add a "test case available" label.
Better to attach to an existing PR rather than make it create a PR, since I see bugs where people have interest in helping so this accomodates X submitting a bug, and Y adding a test case.
(In the world I came from there would be concern about adding a label only after verifying some signature, but this is all just simplifying what people can write, and no tests would be added automatically anyway, so there's no need for any of that.)
Cool @elibarzilay - this looks like a case of creating a setup where we get working test cases which matches our own in the compiler and it gives us the freedom to cover crashing / OOM / infinite loop tests.
I'm open to building the front-end to something like this, but I'm wary of building a backend to support this myself. My past experience building something similar was that this will be quite a lot of maintenance and work. If that's something you're up for building and owning, then I can think about how we teach and build out a UI for it in the site.
FWIW, I still prefer like my original proposal for a few reasons:
Which is basically a worse is better argument. It does a lot of things worse:
But it covers most of the cases and is more convenient
I have an (unpolished) bug workbench in the staging site:
Yesterday, I got the verification steps working with twoslash code samples as a GitHub Action: https://github.com/microsoft/TypeScript-Twoslash-Repro-Action
This action verifies any code samples in an issue body, or comments against the last 6 versions of TypeScript and the latest nightly build. I just need to work on the reporting aspects of it.
Each run keeps track of:
This action verifies any code samples in an issue body, or comments against the last 6 versions of TypeScript and the latest nightly build. I just need to work on the reporting aspects of it.
Suggestion: why not have the action bisect releases (if it fails the most recent release)/nightlies (if it passes on the most recent release but not the most recent nightly), so it can identify if the issue is a regression, when it was introduced.
Aye, It should support this OOTB today (going from green to red on a new nightly run is just another state change it would detect) - historical results are always kept around, for example this comment shows some different results over time with different builds of TS
As it runs every night, you'd get the list of commits between the two nightlies in the comment which should make it easy to track down
We want to provide better tools for providing reproductions of TypeScript bugs in issues. In an issue we really only have text to work with, so we need some kind of text representation of a TypeScript project.
Classes of issues
Looking through the bugs label, there are a large amounts of cases we can probably address:
There are a few we probably can't:
I'm interested in expanding either of these lists! ^
Possible solution
Two parts:
We use something like
arcanis/sherlock
to run any issues with re-pros on a nightly-ish schedule with different builds of TypeScript and store the results using something like this - this runs TypeScript code with markup which is built for reprosA page on the website which helps you generate a single typescript codeblock which is a reproduction of a bug
Possible Markup
We've been working on a TypeScript markup language for the website and this called ts-twoslash which adds commands inside comments. I'd recommend reading the README for that before expanding some of these examples.
Bad JS Emit
```ts // @showEmit // @emitShouldContain: this string // @emitShouldNotContain: this other string console.log("hello world") ```Bad DTS Emit
```ts // @showEmit // @showEmittedFile: index.d.ts // @emitShouldContain: 123 // @emitShouldNotContain: 456 export const myVariable = "123123" ```This code should be this type
```ts // @noImplicitAny: false const a = 123 // ^? = const a: 123 ``` or ```ts // @noImplicitAny: false const a = 123 // ^? != const a: any ```This code is slow
```ts // @slow const a = 123 // ^? ``` This annotation could indicate that it should fail if it takes over 1sThis code crashes
```ts const a = 123 a.l ``` In theory there shouldn't need to be a crashes annotation, all code should not crash. I'm open to debates on why that might need to be different though.This code should error
```ts // @expectErrors: 4311 const a = "hello" a * a ```This code should not error
```ts const a = "hello" a * a ``` The lack of an error annotation should indicate this. If it is expected that there should be some errors, then it can use the `@errors: 3211` which is already set upThis refactor didn't do what I expected
I'm still a bit hazy on this, in fourslash there are a lot of options here - but to try provide a really high-level ```ts const a = "hello" a * a // ^^^^^! refactor ``` This could create a JSON dump of all the refactoring options maybe?Web Page
We have been working towards having the playground support a custom set of tabs in the sidebar, it's not a far reach to imagine that an editor for twoslash could be the playground but with tabs focused on running ts-twoslash and getting useful results.
Here's a rough idea of what that could feel like:
The blockers on that: