RFC 110: support virtual test suites in wptrunner

web-platform-tests / rfcs

web-platform-tests RFCs

75 stars 63 forks source link

RFC 110: support virtual test suites in wptrunner #110

Closed WeizhongX closed 1 year ago

WeizhongX commented 2 years ago

Hi James, thanks for the review. I will update the document. But before that here are some backgrounds:

Luke from previous google ecosystem team discussed this with you in this email thread: https://groups.google.com/a/google.com/g/ecosystem-infra-internal/c/U4ho9P3LVtk

I later took his work. Our goal is to use wptrunner to run the tests we now had in chromium. Virtual tests is part of that. Why we need run virtual tests? It is to improve test coverage.

Previously I was focusing on the performance side of wptrunner. I had been in discussion with you through code review and in a meeting. Now we think performance won't be an issue, as majority of the time is spent in browser, not in the test runner. As a next step, we want to implement the feature gaps we had between the runner we had now and wptrunner.

Thanks again, will update the document shortly.

jgraham commented 2 years ago

https://groups.google.com/a/google.com/g/ecosystem-infra-internal/c/U4ho9P3LVtk

Note that that seems to be a link to an internal mailing list.

Looking at my copies of the mails, I'm pleased to find I said more or less the same thing at the time* : changing test identity is hard, so you probably want to look for ways to make this about configuration rather than identity. The exact design then depends on the exact tradeoffs you want e.g. whether it's acceptable to require multiple invocations of wptrunner for each configuration.

* And therefore might not be going entirely mad ;)

WeizhongX commented 2 years ago

James, let's do some more discussion first, before I starts to update the document. Worried about making wrong updates.

foolip commented 2 years ago

@jgraham I guess in Gecko CI you invoke wptrunner multiple times to get the equivalent setup? How many such suites do you have, and do you have a problem with the overhead of very small suites that just run a handful of tests with some special config?

I don't know if that'll be a problem in Chromium, but it's the potential issue that comes to mind.

jgraham commented 2 years ago

Yeah, in Gecko CI we tend to have different tasks for each configuration. We don't tend to have tiny suites that just run a handful of tests, but only use this for big configuration differences like "is fission (site isolation) enabled". Being able to use it for smaller sets of tests might indeed be an improvement for us as well :)

Irrespective of Mozilla's requirements, I think it's reasonable to support multiple configurations in a single run, with a loop over each of the the configuration settings (the loop would be inside the environment setup but outside the repetition of runs). With the logging setup, that would mean that each run corresponds to a different suite-start/suite-end pair, which would have its own log output with a single set of RunInfo. Doing this might require some changes to how/when we load tests (since we'd ideally read the manifest once before loading the servers but then filter down to the relevant paths once per configuration), but I'm sure it would be tractable.

WeizhongX commented 2 years ago

James, thanks for the feedback.

The problem I had at my side is that we had an old runner running wpt tests. If we want to switch to use wptrunner, performance wise wptrunner should perform similarly to the old runner. This is kind of a requirement from my lead. If we are using a loop for each virtual test suite, I am afraid a lot of time would be spent waiting for the slowest worker. Even now we are running a different loop for each test types. I am thinking of making a code change for this and send it out for review. (wptrunner currently is slower than rwt, and I am trying squeezing every second out of it.)

Is there any particular reason we want to run test this way? In chromium we had something called result viewer, which load test results into web page. Developers don't really use the log a lot. Maybe we should have similar thing for wptrunner?

jgraham commented 2 years ago

It's important to realise that in the wptrunner setup the logging is integral to the way that we collect test results. It's not just about output to stdout, but also directly responsible for recording test results and creating all the artifacts we use to record results for other systems (e.g. wptreport.json file). If we had browser-based result viewing it would also be driven by the logging. So we have to do something that allows the logger to understand which result comes from which run.

Being able to run tests of multiple types before tests of the previous type finished seems pretty reasonable in this model. They're all running with the same configuration and it would indeed provide some performance benefits in the wptrunner model. Gecko runs each test type on different workers so it wouldn't really help us, but it certainly seems like a reasonable change that probably isn't too complex.

Running each configuration of tests without waiting for the previous configuration to complete seems harder to get right because of implementation details in the logger. It currently assumes that there's a single "suite" running at a time per logger. Since each "virtual test suite" (i.e. configuration) would be a seperate suite from the point of view of the logger (because each suite has a single set of RunInfo) you'd need to find a way to have a single logger per configuration/virtual suite, and then combine the results later. That sounds pretty hairy, but I can understand why you want it if the plan is to have lots of small suites and lots of paralellism.

Maybe we could find some backwards-compatible way to tag results with a specific suite/configuration name and make the mozlog handlers/formatters deal with the additional complexity where necessary. That also sounds tricky, but not impossible.

I still think this is going to be orders of magnitude easier than anything which involves manipulating test ids, given how much of the system depends on shared assumptions about which tests exist and how they relate to items in the manifest.

WeizhongX commented 2 years ago

Folks, 'wpt run' has several command line arguments to control which tests to run. What would be a desired behavior if we pass one of such arguments and a virtual test suite definition file to wptrunner?

Here is a list of such arguments: [--include INCLUDE], [--include-file INCLUDE_FILE], [--exclude EXCLUDE], [--include-manifest INCLUDE_MANIFEST], [--test-groups TEST_GROUPS_FILE]

jgraham commented 2 years ago

You can see how most of the existing options interact starting at https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/wptrunner.py#67-79. Basically everything ends up building a manifest (like you can provide via --include-manifest) that acts as a filter when reading the MANIFEST.json file.

Generally we consider the command line options to be additive (e.g. --include and --include-file both add to the list of things to include). However --test-groups is a bit different; tests to include must be in the group, but the other options allow selecting a subset of the group.

In practice I don't think it's at all common to use these options together, but a "virtual test suite" (at least the way I think about it) is basically a test group plus some additional configuration, so I think you want to follow that model.

WeizhongX commented 2 years ago

Thanks James.

In chromium we are not adding virtual tests to manifest, and I had been thinking we would do the same this time. Your comments did remind me. If we add virtual tests to manifest, maybe everything else will just follow suit? Is it true we do not need to make change at any other places? One concern is that this could bloat the Manifest.json file. Take the example in the RFC, we now will have virtual/css-highlight/css/css-pseudo/..., and virtual/css-highlight/css/css-highlight-api/... for all tests that are under css/css-pseudo, and css/css-highlight-api. Maybe we can have a new section in Manifest.json, at the same level of 'crashtest' or 'reftest' (though it is not a test type), and have some sort of mechanism to link back to real tests, for example we can use '*' to denote all tests under the same path of real tests. WDYT?

After we added virtual tests, will we allow virtual tests in '--include', '--include-file' or the groups defined by '--test-groups'? Or will those parameters still only allow real tests, and we will also run all the virtual tests that can be mapped from those real tests? Looks like going with formal one will make things easier.

WeizhongX commented 2 years ago

Thought this again, I think virtual test suites is ephemeral, so no need to be persisted to file system. I think we should update ManifestLoader and testfiles.load_manifest. (If the latter is only used at upstream, and if we don't have plan to run virtual tests at upstream yet, we can put that as low priority.)

jgraham commented 2 years ago

To be clear there's a bit of a terminology clash here. The manifest in --include-manifest is not the same as the manifest in MANIFEST.json. The latter is the canonical list of all tests in JSON format, whereas the former is actually a re-use of the ini-like format and associated code we usually use to store per-test-file metadata to instead store whether a given test file is included in the current run. In retrospect we should have called that argument --include-metadata and consistently used "metadata" to refer to anything using the ini-like format and "manifest" only to refer to the complete list of tests. I think that would also have been confusing, but less confusing in this situation.

I still don't think you want to tie "virtual test suites" into the manifest. It's configuration and can be treated as such without affecting test identity. That also solves the problems with includes: a virtual test suite is the same as a test group, but with attached configuration.

WeizhongX commented 2 years ago

James, thanks for the prompt response.

If virtual tests are treated without affecting test identity, how shall we report test results? We could have multiple virtual tests for the same real tests, and run them in one launch of wptrunner.

We can treat virtual test suite similar to a test group. One difference I can think of is when we pass in --virtual-test-suites, we meant to run those additional virtual tests together with the real tests.

For the metadata, our plan is to allow virtual tests to have their own metadata, because virtual tests could behave differently to the real tests. We plan to put it under virtual/prefix/path/to/real/tests. If virtual tests behave same to its real version, we can allow virtual tests to reuse the metadata of its real version. If virtual tests do not have test ids, we need to find a way to associate a virtual test with its metadata.

WeizhongX commented 2 years ago

James, I pushed a new version of this rfc. That captures what I am thinking right now, and could convey additional informations.

Let's continue our discussion here, and see if we can find a better solution.

jgraham commented 2 years ago

So it feels like we haven't really reached much common ground here yet.

The proposal still seems to be very much centred around the idea of "virtual test suites" as additional tests which are added in some layer above the test manifest. I am still concerned that this violates the invariant that the set of all tests can be computed directly from the filesystem (and is stored in a manifest that's computed directly from the filesystem).

There are a few data-model level disadvantages of this approach:

Anything that's currently using the manifest as a source of truth about what tests exist, and how they relate to the filesystem, needs to be updated to know about "virtual" tests. The metadata update is a clear example: the mapping from test ids to metadata file is not trivial, so the metadata update code uses the manifest to perform this mapping (and also to prune metadata for deleted tests). With this change it needs to be aware of "virutal" tests.
It requires any other tooling to understand virtual tests to make sense of the data. For example one would expect wpt.fyi to be able to compare a test result in configurations A and B. When the configuration is represented in the test id you need special hardcoded handling of the encoding to work out which different tests are comparable.
The proposal is different to any of the precedent in wptrunner. Fundamentally a virtual test suite is just a tuple of (configuration, test list). We already have ways to represent browser configuration (notably RunInfo data) and test lists (test groups / include manifests). Adding another method that's totally different to what already exists is confusing and will make the code harder to maintain in the long run (ideally I'd like to remove some of the ways we represent test lists, not add a new one). It also makes the representation in output different to existing configuration (e.g. we don't represent os configuration as /virtual/linux or virtual/windows prefixes on the test ids, even though different platforms also sometimes have different applicable tests).

As far as I can tell the main driver for this approach is:

It avoids a synchronisation point in wptrunner where we need one full configuration to complete before we can start running the next configuration. In hightly parallel environments like the one Chrome CI uses that is a noticable bottleneck.

(I feel like there's also an unspoken reason which is "it's the design Blink engineers are already used to", which is fair enough as a starting point, but doesn't hold much weight for people outside of that context when there are other tradeoffs to consider).

At the very least the RFC needs to acknowledge these issues and make the case that the design being proposed is really the best given all the constraints. If parallel performance is a big concern there are probably a number of things we can do to improve it in general (e.g. queue more of the work upfront, and then just run through the queue, rather than having loops per test type etc.). For parallelizing multiple configurations the fundamental problem is that we have a single logger and mozlog assumes each logger is running exactly one suite (with a given configuration) at a time. We could relax that constrain in mozlog (although it would involve updating consumers, which would again be somewhat tricky), or we could investigate having one logger per suite and coalescing the output. Or somewhat equivalently we could look at higher level approaches like making it possible to setup the test environment and then start multiple wptrunner processes, so that the scheduling could happen at a higher level but with a shared environment. There is also plenty of Chrome-specific lower hanging fruit for performance e.g. preloading Chrome instances so that each new test group doesn't have to wait for the binary to start (assuming you're using test groups at all).

So while I agree that "get maximum performance when multiple configurations of the same tests are run in parallel on the same machine" isn't a design constrain that previously applied to wptrunner, and that therefore it will take some work to optimise for it, I don't think that there's enough evidence presented to conclude that we therefore need to add data-model level hacks to avoid doing that work.

WeizhongX commented 2 years ago

So it feels like we haven't really reached much common ground here yet.

The proposal still seems to be very much centred around the idea of "virtual test suites" as additional tests which are added in some layer above the test manifest. I am still concerned that this violates the invariant that the set of all tests can be computed directly from the filesystem (and is stored in a manifest that's computed directly from the filesystem).

While we are not proposing to persist virtual tests to filesystem, we can add virtual tests to manifest immediately after the tests are loaded. This is kind of a centralized change so would be easier to manage?

There are a few data-model level disadvantages of this approach:

Anything that's currently using the manifest as a source of truth about what tests exist, and how they relate to the filesystem, needs to be updated to know about "virtual" tests. The metadata update is a clear example: the mapping from test ids to metadata file is not trivial, so the metadata update code uses the manifest to perform this mapping (and also to prune metadata for deleted tests). With this change it needs to be aware of "virutal" tests.

The reason we want to let virtual metadata fallback to non virtual version is to save space, and also the time needed to download metadata to the bots. If this is a big concern, we can remove this requirement for now, then metadata part won't need to understand "virtual" tests.

It requires any other tooling to understand virtual tests to make sense of the data. For example one would expect wpt.fyi to be able to compare a test result in configurations A and B. When the configuration is represented in the test id you need special hardcoded handling of the encoding to work out which different tests are comparable.

I don't think wpt.fyi needs to understand virtual tests, unless we have standardized some prefixes, means we should only compare configuration A to configuration A, even though different organization can have different args or prefs for that configuration.

The proposal is different to any of the precedent in wptrunner. Fundamentally a virtual test suite is just a tuple of (configuration, test list). We already have ways to represent browser configuration (notably RunInfo data) and test lists (test groups / include manifests). Adding another method that's totally different to what already exists is confusing and will make the code harder to maintain in the long run (ideally I'd like to remove some of the ways we represent test lists, not add a new one). It also makes the representation in output different to existing configuration (e.g. we don't represent os configuration as /virtual/linux or virtual/windows prefixes on the test ids, even though different platforms also sometimes have different applicable tests).

I agree. We analyzed this previously and there are different approaches to implement this. One way to implement this is to implement it outside of wptrunner. The problem is the performance penalty caused by that. We do support running tests on different platforms in our old runner, and we plan to support that outside of wptrunner this time, as there is no way we can parallelize running test on linux and windows.

As far as I can tell the main driver for this approach is:

It avoids a synchronisation point in wptrunner where we need one full configuration to complete before we can start running the next configuration. In hightly parallel environments like the one Chrome CI uses that is a noticable bottleneck.

True!

(I feel like there's also an unspoken reason which is "it's the design Blink engineers are already used to", which is fair enough as a starting point, but doesn't hold much weight for people outside of that context when there are other tradeoffs to consider).

This is not the case. I would say I will take whatever that is an easier approach for me, with the condition it meets our performance requirement.

At the very least the RFC needs to acknowledge these issues and make the case that the design being proposed is really the best given all the constraints.

Sure, I can add that.

If parallel performance is a big concern there are probably a number of things we can do to improve it in general (e.g. queue more of the work upfront, and then just run through the queue,

Regarding "queue more of the work upfront", do you any details?

rather than having loops per test type etc.).

This is in our plan.

For parallelizing multiple configurations the fundamental problem is that we have a single logger and mozlog assumes each logger is running exactly one suite (with a given configuration) at a time. We could relax that constrain in mozlog (although it would involve updating consumers, which would again be somewhat tricky),

I am not aware of this before. Aren't we already using multiple workers to run tests at the same time? If you think this is a problem, we can investigate this and see how much improvement this can bring.

or we could investigate having one logger per suite and coalescing the output. Or somewhat equivalently we could look at higher level approaches like making it possible to setup the test environment and then start multiple wptrunner processes, so that the scheduling could happen at a higher level but with a shared environment.

I would say coordinate between multiple wptrunner processes will be even more complicated.

There is also plenty of Chrome-specific lower hanging fruit for performance e.g. preloading Chrome instances so that each new test group doesn't have to wait for the binary to start (assuming you're using test groups at all).

Thanks for mentioning this. I did thought this previously. This could be a mitigation plan if we restarts chrome a lot.

So while I agree that "get maximum performance when multiple configurations of the same tests are run in parallel on the same machine" isn't a design constrain that previously applied to wptrunner, and that therefore it will take some work to optimise for it, I don't think that there's enough evidence presented to conclude that we therefore need to add data-model level hacks to avoid doing that work.

I wouldn't say this is a hack of the data-model. What if we rebuild the manifest to include virtual tests at the beginning(or even before we started wptrunner)? In this case I think the only change we need to make in wptrunner is 1)to strip 'virtual/prefix/' from the url before we send that to the browser, 2)to restart browser if it needs to be started with different parameters or prefs.

WeizhongX commented 2 years ago

Hi James, I am trying to look into RunInfo to understand how that can be used here. Looks like to me we are using it to compute the metadata. But can we use it to pass in additional command line parameters, or prefs?

Not sure if you have any additional comments to my previous comments. Thanks a lot!

jgraham commented 2 years ago

Sorry, I've been meaning to reply here for a while, but haven't got around to it. Apologies for the delay.

But can we use it to pass in additional command line parameters, or prefs?

RunInfo is output, not input. For example in firefox we add additional data to distinguish between different configurations we run in CI, like swr (Software WebRender) or sessionHistoryInParent. Currently we actually set that configuration directly using command line options, but being able to express it as a set of "vitrual test suites" with the mapping between configuration, prefs and tests stored in an external data file rather than in a combination of command line options and code would be an improvement.

For parallelizing multiple configurations the fundamental problem is that we have a single logger and mozlog assumes each logger is running exactly one suite (with a given configuration) at a time. We could relax that constrain in mozlog (although it would involve updating consumers, which would again be somewhat tricky)

I am not aware of this before.

So I think the mozlog setup is really the fundamental problem here.

Currently wptrunner in pseudo-code looks like

create logger
start test environment
load tests
for each repeat:
    log suite_start w/configuration
    for each test type:
        start worker pool (manager group)
        run each test using pool
        stop worker pool
    log suite_end

The obvious way to add multiple configurations is as a new outer loop:

create logger
start test environment
load tests
for each repeat:
    for each configuration:
        log suite_start w/configuration
        for each test type:
            start worker pool (manager group)
            run each test using pool
            stop worker pool
        log suite_end

But for maximally parallel performance you want to create a generic worker pool upfront and distribute tasks to the workers as soon as one is free, rather than having a synchronisation point at the end of the inner loop where we need to wait for all workers to complete. For test types it's conceptually easy, we just refactor like:

create logger
start test environment
load tests
for each repeat:
    for each configuration:
        log suite_start w/configuration
        start worker pool (manager group)
        for each test type:
            run each test using pool
        stop worker pool
        log suite_end

What stops us just repeating the trick is the log suite_start w/configuration part. The way the logging works each test is assigned to a suite just based on ordering; if the test log events are after a given suite_start and before a given suite_end they're part of that suite. mozlog also enforces that all tests have to be part of a suite, so we can't just drop the suite_start calls.

The obvious solution here is to just make a suite bigger; let's just call all configurations and repeats of web-platform-tests a single "suite". But the problem with that is that suites are 1:1 with configuration data. So doing that in a naive way requires you to encode the per-test configuration in some other way (e.g. your proposal to make the configuration part of the test id). Now you don't have a consistent model around which things are the same test but running under different configurations.

The mozlog setup already causes a few problems. For example we've had the problem that if you do two repeats of the suite they end up creating two top-level JSON objects in the wptreport.json file. That's not impossible to work with but it's unexpected for consumers.

So a proposal for fixing this is that we have one logger per repetition/suite, each with a different output filename. Then we also have a single logger on the main thread that just queues up all the log results, but only writes to stdout. That requires the top-level logger to relax the invariant that everything is inside a single suite_start/suite_end, but as long as we aren't using the output of that for anything which depends on knowing that mapping we should be OK. That of course requires some changes to mozlog, but I can help with that.

With that kind of change the overall architecture could become:

create top-level logger
start test environment
create test_queue
load tests
for each repeat:
    for each configuration:
        create suite_logger
        logger.suite_start w/configuration
        for each selected test:
            enqueue suite_logger, test in test_queue
start worker pool
run each test in test_queue using pool
stop worker pool
for each repeat:
    for each configuration:
        log suite_end for suite_logger

Of course there are some details to sort out here, and it's a more complex change than just putting configuration into the test id. But I think it's basically the right architecture if you're trying to get maximum parallel performance, and having each repeat configuration output seperate logfiles seems like it's going to solve some other problems (like the multiple top-level JSON objects in wptreport.json when we run >1 repeat).

Does that make any sense?

WeizhongX commented 2 years ago

Thank you James for the detailed explanation, and no worry. I know I had been using a lot of your time.

For the overall architecture you proposed, I think we can at least use that as the first step, then move ahead from there.

Maybe my question is why we need pass configuration to logger.suite_start? How is that going to be used? If we need pass configuration in some cases, do we also need pass in the additional configurations for virtual tests to logger.suite_start? For chromium's case, the additional command line arguments will be only be used to decide if we need to restart browser, and if it is, pass such arguments to browser.

I also want to confirm with you if we had agreed that there will be an optional command line argument to pass in a json configuration file for virtual test suites. I was hoping this feature will be also useful for other vendors. But if it is not, I think it's blink team's responsibility to make sure this won't negatively impact other vendors.

Thanks!

jgraham commented 2 years ago

Maybe my question is why we need pass configuration to logger.suite_start? How is that going to be used? If we need pass configuration in some cases, do we also need pass in the additional configurations for virtual tests to logger.suite_start?

Because the consumers of the logging expect/need to be able to access configuration data in order to know what browser setup produced the given results. For example this is written to wptreport.json files which are then used by the automatic metadata update, which is capable of taking the results from multiple configurations and writing a single set of metadata files that covers all of them.

The fact that the configuration is written to suite_start is not really the problem, it's that individual test events are associated with a given configuration by ordering rather than by an explicit property in the logging. Even if we had a different way of recoding the configuration in the logs (e.g. a config event) having multiple configurations running in parallel and all outputting to the same logger would require some explicit association between the configuration event and the test* events. Since we don't have that at the moment, all existing consumers would need to be updated to handle the change. It's not impossible to do, but like all breaking changes it's a lot of work. Changing wptrunner (and maybe also mozlog a bit) so that we have a 1:1 relationship between output artifacts and configuration seems like it will likely be easier.

I also want to confirm with you if we had agreed that there will be an optional command line argument to pass in a json configuration file for virtual test suites.

Yes, a command line argument to pass a file which contains named configurations and tests seems fine.

WeizhongX commented 2 years ago

Hi James, thanks for the discussion in the meeting. One question we later get is how do we manage metadata if virtual tests do not have a test id. Do they all go into same file or each config will a separate root, so each virtual test will have a separate metadata? For the first option, it would make a metadata too complicated to manage. And if multiple teams are working at the same time on a file, it increases the chance of a merge conflict. For the latter, while sometimes virtual tests will produce different results to their base tests, for most of the time they should produce same results. So doing this will create a lot of duplicated files, and increase the time needed to download the files to the testing bot.

jgraham commented 2 years ago

Metadata works the same as for any other configuration. The built-in metadata support is based on conditional values that can switch on configuration. For example this test meatdata from mozilla-central where we have a specific result expectation for debug android builds, and also have fuzzy metadata that depends on platform and configuration. In particular swgl there is "using the software rendering backend" which is equivalent to a "virtual test suite".

In practice managing the metadata isn't that complex. The format is essentially line-based so it's possible to write configuration-specific metadata on its own lines and still make changes to the metadata for other configurations if required.

WeizhongX commented 1 year ago

The link to the Virtual config file in chromium: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/web_tests/VirtualTestSuites

jgraham commented 1 year ago

OK, I've produced a more-or-less functional prototype indicating the kind of design I think makes sense here.

Basically the difference is that instead of changing the test manifest, the "virtual" suite (which is called "subsuite" in the patch) is an explicit concept in the test runner and results. This means that there is still a 1:1 relationship between metadata files and test files, and we're not relying on parsing data out of the test id to understand the configuration.

It depends on some changes in mozlog, so it's a little hard to share on github. I've pushed the changes to Mozilla's phabricator (there are six commits in total, look under "stack" to see the others) and try server, which should allow you to see what's going on (I've tried to put a reasonable amount of documentation in the commit messages). But if that's too confusing let me know and we can try and push the wptrunner part of the changes to github, and provide a diff to apply to a local mozlog to allow testing.

WeizhongX commented 1 year ago

@jgraham, we reviewed your code change, and we are generally fine to go with that route. We will make some change at our side to make it also work for us.

Some minor things I want to bring up:

in the configuration file, can we have another two fields: "platforms" and "expires"? Ppl can easily to add a lot of tests with subsuite. "platforms" allows us to run tests on less platforms for heavy suites. "expires" will automatically stops to run a subsuite after certain date, in case ppl forget to remove a subsuite.
Wptrunner caches test results some times (for reftests?). The cache should now be keyed with (subsuite name, test id).
read_include_from_file should be updated also. I assume the items in this file can be "subsuite:test-id"? (2 and 3 are code related stuff. We can also discuss that at code review.)

Can you help update the RFC, as this now mostly reflects your idea?

jgraham commented 1 year ago

In the configuration file, can we have another two fields: "platforms" and "expires"? Ppl can easily to add a lot of tests with subsuite. "platforms" allows us to run tests on less platforms for heavy suites. "expires" will automatically stops to run a subsuite after certain date, in case ppl forget to remove a subsuite.

This feels like weird layering to me, but I understand it's what your setup already has. In particular, making the subsuite configuration just based on platform rather than allowing anything in the base RunInfo to be used for configuration seems unlike anything else in wptrunner.

The expiry time I guess I can live with, but the harness having to verify that your CI configuration is still valid seems like an odd setup; putting high level concerns very deep in the implementation.

Wptrunner caches test results some times (for reftests?). The cache should now be keyed with (subsuite name, test id).

Yep, good catch.

read_include_from_file should be updated also. I assume the items in this file can be "subsuite:test-id"?

Well, not with the way it works at the moment. Because those things end up affecting the include argument, which runs as a pre-filter, before we actually load the tests, which also means it currently can't have different behaviour for each subsuite; the tests which are run will be the intersection of the include settings and what's actually defined for the subsuite. That's not impossible to fix but isn't necessary to change if you don't have a specific use case for running different subsets of tests per subsuite.

WeizhongX commented 1 year ago

This feels like weird layering to me, but I understand it's what your setup already has.

Our experience with virtual tests is that it can easily cause test bloat. That is the reason why we have platform and expiry time settings. I would imagine this will also be useful at your side. My plan is to have a separate tool to convert our existing format to the format used by wptrunner, every time there is an update. I guess this is doable as long as we have expiry time at wptrunner side. We can have a different file for each platform if platform is not supported natively.

Well, not with the way it works at the moment.

We do have such use cases. We have something called flag specific suites that allow us to run any combination of tests/subsuite tests. Devs can also run any individual tests from any subsuite, for debugging purpose, right?

Maybe I can update the RFC first, then you can do a final update? Do you think if we can go through the review purpose offline, so that we don't need to wait until next meeting?

jgraham commented 1 year ago

It was easier to create a new RFC than update this one, so I've created https://github.com/web-platform-tests/rfcs/pull/138. I'm going to assume it's OK to close this for now, but let me know if that's a problem.

WeizhongX commented 1 year ago

That's fine, thanks!