oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
715 stars 112 forks source link

[tests] Split sycl_iterator tests to separate tests #1660

Open SergeyKopienko opened 1 week ago

SergeyKopienko commented 1 week ago

In this PR we split sycl_iterator tests to separate tests: one test in one file. The goal - to avoid timeout errors in CI system for these tests.

Configuration Label Time Summary (main branch) Label Time Summary (this branch)
CPU,bknd=dpcpp,cmplr=icpx,ubuntu-20.04,std=с++17,cfg=release 209.17 sec*proc (7 tests) 224.53 sec*proc (78 tests)
FPGA_EMU,bknd=dpcpp,cmplr=icpx,ubuntu-20.04,std=с++17,cfg=release 58.60 sec*proc (7 tests) 63.56 sec*proc (78 tests)
HOST,bknd=tbb,cmplr=icpx,ubuntu-20.04,std=с++17,cfg=release Total Test time (real) = 56.33 sec Total Test time (real) = 56.64 sec
HOST,bknd=tbb,cmplr=icpx,ubuntu-20.04,std=с++17,cfg=release Total Test time (real) = 55.94 sec Total Test time (real) = 56.23 sec
HOST,bknd=omp,cmplr=icpx,ubuntu-20.04,std=с++17,cfg=release Total Test time (real) = 59.84 sec Total Test time (real) = 59.84 sec Total Test time (real) = 60.39 sec
... ... ...
CPU,bknd=dpcpp,cmplr=icx,windows-2019,std=c++17,cfg=release 254.18 sec*proc (7 tests) 317.38 sec*proc (78 tests)

This info has been collected from https://github.com/oneapi-src/oneDPL/actions/runs/9764889560 and from https://github.com/oneapi-src/oneDPL/actions/runs/9778104870?pr=1660

UPD: after applying the PR https://github.com/oneapi-src/oneDPL/pull/1667 from Julian, probably this PR isn't required anymore. But may be somebody know another reasons to apply it?

julianmi commented 1 week ago

Couldn't we simply increase the timeout in our CI? One downside I noticed in this approach is the increased CI time (13 vs. 10 minutes for DPCPP backend on Linux and 18 vs. 13 minutes on Windows).

SergeyKopienko commented 1 week ago

Couldn't we simply increase the timeout in our CI? One downside I noticed in this approach is the increased CI time (13 vs. 10 minutes for DPCPP backend on Linux and 18 vs. 13 minutes on Windows).

For me - it's out of my scope (to increased CI time). But when I had deal with tests I understood that the best situation - the we have one test for one algorithm. So I prefer to split tests and never write so big tests like this set.

@dmitriy-sobolev what do you think about increate test timeout in CI system?

dmitriy-sobolev commented 1 week ago

Honestly, the amount of changes is terrifying, and because of that I need to take a closer look at the PR, even though these are mechanical changes.

@SergeyKopienko, I have no strong preference between yours or Julian's suggestions. Julian's one is focused in solving the issue with the timeout, and would be really good if it was enough to raise the timeout value not too much, e.g. from 6 to 9 minutes per test and agree what is the upper bound for the timeout to avoid future unjustified increases. However, you've already done a lot of work and you've also addressed a design issue with many cases in one single test file, so it might be better to continue.

To add more context, the timeout issue started occurring since https://github.com/oneapi-src/oneDPL/pull/1653.

I have some questions:

  1. The necessity to fix timeout issues on FPGA emulator might be a symptom of increased compilation/execution time even on other platforms. Have you checked how the time of tests changed on after applying #1653 (on FPGA emulator, CPU and GPU devices)? We previously had issues with large JIT time, so it should be checked.
  2. What is a motivation to preserve the backend type names in the tests? I suppose it might be removed as non-necessary now.
  3. zip_iterator test has the same structure. Perhaps, the cases from this large file may moved the the newly created per-algorithm files.

Meanwhile, @MikeDvorskiy, thought about refactoring the whole test system and this issue was a motivation. Perhaps, he might also comment and suggest a better solution if he knows any.

danhoeflinger commented 1 week ago

2. What is a motivation to preserve the backend type names in the tests? I suppose it might be removed as non-necessary now.

I too am confused by our redundancy here of sycl_iterator_* vs other tests for the same algorithms (sycl_iterator_sort.pass vs sort.pass for example).

I believe sycl_iterator_ is meant to describe the data type rather than the backend type (although in this case one implies the other inherently). These tests use sycl::buffer and oneapi::dpl::begin and oneapi::dpl::end to test sycl_iterator inputs to these algorithms. It should be mentioned that they also test USM data inputs both shared and device. I know we use sycl_iterator_* tests to be a skeleton crew test for dpcpp backends in per-commit CI and validation, to provide decent coverage without calling all tests. If we were to reorganize these tests, we would need another system to provide that. It could be done, but would require work.

Now that we have the input_data_sweep_* tests which isolate our input data processing for combinations of input types including sycl_iterator, usm, etc. and different combinations of those base types wrapped in our iterator types, I do wonder how important it is to have coverage for each algorithm to be tested with multiple input data types. I'm not necessarily arguing to remove large chunks of coverage, but it is worth considering as we think about such large changes to the tests.

Edit: In theory, the input_data_sweep_ tests check read and write ability for a large sweep of possible inputs to sycl kernels. That set of tests should allow us to be more confident that if data of one input type functions in a pattern / API, then other valid input types to oneDPL would function equivalently, and we don't need to test every pattern with every base data type.

SergeyKopienko commented 1 week ago

2. What is a motivation to preserve the backend type names in the tests? I suppose it might be removed as non-necessary now.

@dmitriy-sobolev C

2. What is a motivation to preserve the backend type names in the tests? I suppose it might be removed as non-necessary now.

All new test files has been renamed: now their names are shortly.

julianmi commented 3 days ago

I think the main issue motivating this PR was the test timeouts observed in https://github.com/oneapi-src/oneDPL/pull/1653. These had nothing to do with the test setup but was a bug introduced in the patch. This was addressed in https://github.com/oneapi-src/oneDPL/pull/1667. Thus, I think we can close this PR.

I agree with @danhoeflinger that we might want an offline discussion about restructuring our per-commit CI tests.

SergeyKopienko commented 3 days ago

@timmiesmith is this PR not required anymore for us?