tc39 / test262

Official ECMAScript Conformance Test Suite
Other
2.3k stars 459 forks source link

Testing the Temporal proposal #3002

Open jugglinmike opened 3 years ago

jugglinmike commented 3 years ago

The Temporal proposal reached Stage 3 on 2021-03-10, and like all ECMA262 proposals, it needs coverage in Test262 to advance to Stage 4. In terms of normative text, it is easily the largest proposal we've seen. Testing it will definitely require significant time, effort, and coordination (even with the proposal's existing tests as a starting point). We'd like to document progress publicly for transparency and to encourage participation.

To get the ball rolling, I'm suggesting that we track progress on an per-API basis using this GitHub.com issue. Compared to alternatives like collaborative document editing services and an in-repository text file, this seems like it balances ease of hosting, maintainability, and proximity to the tests. This approach worked well for ES6's well-known Symbols--the most similar (though much smaller) effort that comes to mind.

Working mode

Anyone can comment here when they'd like to begin writing tests for a specific API. A maintainer will update the list to reflect the ongoing and completed work. When you submit a pull request, please mention this issue (just write "gh-3002") so GitHub displays the status of your work in this thread.

Tests for multiple APIs can be submitted in a single pull request, but patches should be small enough for another contributor to review them in a single session. For instance, it probably makes sense to submit tests for all of the @@toStringTag methods in one patch. There's some subjectivity here, but when in doubt, we should err on the side of smaller patches (and remain open to reviewers' requests to break up pull requests).

Status

(generated via the following script) ```js Array.from(document.querySelectorAll('[id^="sec-temporal."], [id^="sec-get-temporal."]')) .map((el) => { return el .querySelector('h1') .innerText .replace(/^[0-9\. ]*(?:get )?([^ ([]+)(\[[^\]]+])?.*/, '- [ ] $1$2'); }) .join('\n'); ``` I executed that in the browser's Developer Console on [the proposal's specification text](https://tc39.es/proposal-temporal/).

Notes about specific things that should have test coverage:

jugglinmike commented 3 years ago

@ptomato @ms2ger I know this work is near and dear to your hearts :)

ptomato commented 3 years ago

Thanks for getting this started! FWIW, I've been finding it easier to write tests by topic, rather than by API entrypoint — for example, here are a series of largely identical tests for everything that uses the GetOption abstract operation: https://github.com/tc39/proposal-temporal/pull/1531/files

One thing I'm planning to do is convert the Mocha-style tests of the research polyfill in that repo to test262-style tests, and delete the old-style tests as I go. So anyone picking one of these items can use those old-style tests as a starting point.

Ms2ger commented 3 years ago

Thanks, @jugglinmike! I think that for the moment it would be easiest to keep developing the tests in the proposal repo, so we can run them against the polyfill as a sanity check. Does that make sense to you?

ljharb commented 3 years ago

Hopefully the polyfill will be finished and deprecated soon tho, so the advantage to keeping the tests there shouldn’t last long?

jugglinmike commented 3 years ago

@Ms2ger I've had luck using Node.js and the test262-harness project to run Test262 against the polyfill:

$ test262-harness \
    --hostType node \
    --hostPath node \
    --prelude inject-temporal.js \
    'test/built-ins/Temporal/**/*'

Where inject-temporal.js looks like this:

const { Temporal } = require('./node_modules/proposal-temporal');

I'm using the version published to npm, but it ought to work for the proposal's repository, too. Does that help at all?

ptomato commented 3 years ago

@jugglinmike For one thing, we're planning to deprecate that NPM module... An advantage to keeping it in the proposal repository is that we have code coverage metrics for the test262 tests there, and it doesn't require synchronization between git repositories.

(And I can't remember the details off the top of my head, but using require() in the prelude breaks the code coverage metrics, so we generate an IIFE bundle and include that as the prelude.)

FrankYFTang commented 3 years ago

https://github.com/tc39/test262/pull/3048

ptomato commented 3 years ago

I left one comment in #3049 but it could simplify things in all of these pull requests:

Consider porting over the Temporal asserts from the proposal repo to temporalHelpers.js and using those instead of assert.sameValue("...", ....toJSON()). They should give better error messages and be more specific.

linusg commented 3 years ago

@jugglinmike I noticed that all the getters are missing from this list - here's a change to the code snipped you posted to include them in the list:

-Array.from(document.querySelectorAll('[id^="sec-temporal."]'))
+Array.from(document.querySelectorAll('[id^="sec-temporal."], [id^="sec-get-temporal."]'))
   .map((el) => {
     return el
       .querySelector('h1')
       .innerText
-      .replace(/^[0-9\. ]*([^ ([]+)(\[[^\]]+])?.*/, '- [ ] $1$2');
+      .replace(/^[0-9\. ]*(?:get )?([^ ([]+)(\[[^\]]+])?.*/, '- [ ] $1$2');
   })
   .join('\n');

You also might want to update now => Now :)

ptomato commented 3 years ago

@jugglinmike I do think it would've been more convenient to develop these in the proposal-temporal repo in order to get code coverage metrics, and then move them over to the test262 repo all at once, but I think that ship has sailed :smile: Just to keep you updated, I discussed it with @Ms2ger and our plan is to finish converting all of the legacy test suite into test262 tests and then move over what we have in the proposal-temporal repo.

rwaldron commented 3 years ago

@ptomato Mike is on vacation and may not be present in this thread for at least another two weeks.

FrankYFTang commented 3 years ago

I think what @ptomato stated above is perfectly reasonable for Temporal in stage 2 period. Now Temporal is in Stage 3 for about 5 months and in theory the spec is in a shape which is very stable (relative speaking, of course) and would be more benefitial for browser engines under development to sync with the tests right. If that is what the Temporal champion plan to do (move the tests inside the Temporatl repo to test262) then I will not add more Temporal tests PR to test262 until they complete that. Any ETA? To be honest, I am more interest to spend time to change v8 to fulfill the Temporal spec than spend time to add more tests to test262- I only did so because I see no much got added to test262 yet and not aware that Temporal champions plan to do that.

ptomato commented 2 years ago

@jugglinmike I do think it would've been more convenient to develop these in the proposal-temporal repo in order to get code coverage metrics, and then move them over to the test262 repo all at once, but I think that ship has sailed :smile: Just to keep you updated, I discussed it with @Ms2ger and our plan is to finish converting all of the legacy test suite into test262 tests and then move over what we have in the proposal-temporal repo.

To come back to this, any comments on the above plan? Since I acknowledge it is taking longer than I'd like, another option would be to copy over all the tests that currently live in tc39/proposal-temporal to tc39/test262 all at once, and at the same time delete them from tc39/proposal-temporal. However, I'd prefer if that happens that we merge it into tc39/test262 as quickly as possible and correct stylistic points afterward, otherwise they will keep getting out of sync between the two repos as we continue converting the legacy test suite into test262 tests. Would a plan like this be preferable?

ptomato commented 2 years ago

@jugglinmike Any comments on the above idea to copy over all the tests from tc39/proposal-temporal to tc39/test262 all at once? I can make this PR but I don't want it to get stalled and have the two repos get out of sync in the meantime, so I'd like to have your OK before I do it.

rwaldron commented 2 years ago

@ptomato I don't understand why you believe that you need to have two sets of the same tests? Here's what I would suggest:

  1. Create a fork of Test262
  2. Add all of the people that will work on your tests as collaborators
  3. Migrate all of your tests to that fork and delete them from the proposal repo.
  4. Use the Test262 fork as the source of truth while developing your tests and your polyfill.
  5. To test your polyfill, use --prelude, as shown here:
    
    # This is executed at the root of the test262 repo
    test262-harness --threads=16 --prelude=../proposal-temporal/polyfill/script.js --hostType=jsc --hostPath=`which jsc` 'test/built-ins/Temporal/**/*.js'
    FAIL test/built-ins/Temporal/Calendar/prototype/fields/long-input.js (strict mode)
    Expected no error, got RangeError: invalid field name garbage 1

FAIL test/built-ins/Temporal/Calendar/prototype/fields/long-input.js (default) Expected no error, got RangeError: invalid field name garbage 1

Ran 830 tests 828 passed 2 failed


The command itself is: 

test262-harness --threads=16 --prelude=../proposal-temporal/polyfill/script.js --hostType=jsc --hostPath=which jsc 'test/built-ins/Temporal/*/.js'



This is not a second-class way of working with your tests and Test262: this is exactly what I'm doing for `ShadowRealm` tests myself (and what I do for all proposals that I work on, which have polyfills before implementations), and also when I verify any Temporal tests that have been contributed so far. 
Ms2ger commented 2 years ago

What's the advantage of that setup over landing the tests in this repository?

rwaldron commented 2 years ago

@Ms2ger Philip is worried about having two sets of tests going out of sync during development. We're worried about a massive deluge of test material that does not have well understood coverage. There are substantial gaps in coverage in the material already contributed; that creates non-trivial auditing work for someone (whether that is a Test262 maintainer, proposal champion, or other contributor, it doesn't matter). When the tests in the fork have sufficient coverage, then we just merge them.

Ms2ger commented 2 years ago

It's clear that there's no full coverage yet, but it's not clear to me how it's helpful to do the work to review the coverage outside this repo. On the other hand, having them in this repo would be very helpful for the people who are implementing the proposal as we speak.

rwaldron commented 2 years ago

On the other hand, having them in this repo would be very helpful for the people who are implementing the proposal as we speak.

This is actually really compelling. Which engines are currently implementing?

ptomato commented 2 years ago

Hi @rwaldron! I think my concern is actually the opposite — I don't want two sets of tests, only one! We would like to delete the tests from tc39/proposal-temporal as soon as they are present in tc39/test262, and pretty much have a setup prepared just like the one you outlined.

I'm just concerned about the two sets of tests getting out of sync during review, if the PR would (understandably, because large) take a long time to merge, because that would cause a lot of extra work for me. That's why I asked if it could be merged swiftly and stylistic points cleaned up later.

I get your point about coverage, but maybe that could be mitigated using the checklist in the OP of this issue? As we check off each item in the checklist, we could consider the coverage for that item audited. The difference would be that someone working on a checklist item would start from incomplete coverage instead of from zero, which has disadvantages as well as advantages.

The proposal is currently being implemented by V8, SpiderMonkey, and JSC. One other implementation that I'm aware of is in SerenityOS's engine.

jugglinmike commented 2 years ago

Hey there, @ptomato. Let's do it!

@rwaldron and I talked this through today. Although we'd prefer to be more familiar with the test material before merging it, we think the opportunity to assist in-progress implementations, coupled with the correctness (if not completeness) of the tests merged so far, warrant expedition.

ptomato commented 2 years ago

Thanks, that's great news! I am on vacation tomorrow, so will make this pull request as soon as possible on Monday. If you would like to plan a Matrix chat session or video call next week to go through the test material so I can help you get that familiarity, I would certainly be available for that.

jugglinmike commented 2 years ago

Thanks, Philip, that's very kind!

Regarding coordination: I personally won't be available for reviews until Thursday of next week (that is, September 30). @rwaldron should be around on Monday, but we'll collectively have more bandwidth at the end of the week. You can send a pull request whenever you like, of course--I just know you're interested in limiting the review period.

ptomato commented 2 years ago

OK, thanks! In that case I won't prioritize the pull request today, but I'll make sure to send it by Wednesday. Does that sound good to you?

jugglinmike commented 2 years ago

Works for me!

justingrant commented 2 years ago

Are the checkboxes in the OP of this thread still accurate? Or have some additional methods been fully covered by Temporal's 3000+ tests?

Ms2ger commented 2 years ago

I don't think anyone has reviewed the coverage we gained by landing the polyfill tests - I'll see if someone on our side can do it

Bert66cool commented 2 years ago

Yes

Sent from Yahoo Mail on Android

On Tue, Nov 16, 2021 at 12:05 AM, @.***> wrote:

I don't think anyone has reviewed the coverage we gained by landing the polyfill tests - I'll see if someone on our side can do it

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Bert66cool commented 2 years ago

Taken care of 

Sent from Yahoo Mail on Android

On Tue, Nov 16, 2021 at 12:11 AM, Brett @.***> wrote: Yes

Sent from Yahoo Mail on Android

On Tue, Nov 16, 2021 at 12:05 AM, @.***> wrote:

I don't think anyone has reviewed the coverage we gained by landing the polyfill tests - I'll see if someone on our side can do it

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

HKalbasi commented 2 years ago

Will there be tests for each different calendar? Calendars are not symmetric like timezones and it's easy to mistake in only one of them.

Ms2ger commented 2 years ago

We are currently focusing on the ISO calendar, so deep testing of the actual calendar operations is likely a while off. I imagine we would take tests if someone else had time to write them, though.

ptomato commented 2 years ago

I'm wondering what the policy on these should be, as well. Tests for time zones can be difficult because time zones can be adjusted by civil authorities. If that happened, tests could start failing whereas that would not indicate any standards noncompliance, just outdated time zone data.

Calendars are somewhat less subject to change, but still, I do know of a few cases that could cause tests to be invalidated. The Hijri calendar may get correction days depending on astronomical observations (see heading "Difficult to Predict" of this document) and any calendar that uses eras may decide to create a new era in the future.

HKalbasi commented 2 years ago

I feel lacking tests for time zones is fine since time zones are pure data, and implementation can be auto generated from IANA database. but calendars need some code as well, so at least there should be some tests for different classes of calendars. But with this logic, something like era doesn't need test since it is data-ish. Anyway this is not Temporal specific and DateTimeFormat.formatToParts also doesn't test for different calendars, so maybe not having tests for calendars is intended? The other case I can imagine is that formatToParts is implementation defined and can not be tested here (I'm not familiar with formatToParts spec, just know that it is the current way to get calendar data from browser)

I imagine we would take tests if someone else had time to write them, though.

I'm volunteer to write some tests if policy becomes that we need tests. Also I need specification of calendars for writing tests which I didn't find them in ecma-402. Where does it exists?

ptomato commented 2 years ago

@HKalbasi Sorry, I missed your comment back in February. There isn't any specification of calendar arithmetic because it's expected that ECMA-402 delegates it to a library like ICU4C or ICU4X as they do with locale-specific formatting. So, if you write tests for calendar arithmetic I'd recommend doing it on a best-effort / representative-sample basis.

FrankYFTang commented 2 years ago

Anyway this is not Temporal specific and DateTimeFormat.formatToParts also doesn't test for different calendars, so maybe not having tests for calendars is intended? The other case I can imagine is that formatToParts is implementation defined and can not be tested here (I'm not familiar with formatToParts spec, just know that it is the current way to get calendar data from browser)

There are a lot of error throwing behavior need to be tested in DateTimeFormat.formatToParts with the Temporal proposal.

for example step 6-a of 15.3.13 HandleDateTimeTemporalDate ( dateTimeFormat, temporalDate ) step 4-a of 15.3.14 HandleDateTimeTemporalYearMonth ( dateTimeFormat, temporalYearMonth ) and 15.3.15 HandleDateTimeTemporalMonthDay ( dateTimeFormat, temporalMonthDay ) 15.3.17 HandleDateTimeTemporalDateTime ( dateTimeFormat, dateTime ) 15.3.19 HandleDateTimeTemporalZonedDateTime ( dateTimeFormat, zonedDateTime )

and step 6-a of 15.3.19 HandleDateTimeTemporalZonedDateTime ( dateTimeFormat, zonedDateTime )

also Step 5-a of 15.3.7 PartitionDateTimeRangePattern ( dateTimeFormat, x, y ) but that is for formatRange and formatRangeToParts