sbi-benchmark / sbibm

Simulation-based inference benchmark
https://sbi-benchmark.github.io
MIT License
89 stars 35 forks source link

Tests for task's API #29

Closed psteinb closed 2 years ago

psteinb commented 2 years ago

As discussed in #19 this PR isolated the unit test for two_moons to serve as a starting point establishing tests for all tasks.

jan-matthis commented 2 years ago

This is great, cheers!

I'd suggest that we parametrizing those tests such that they run for all tasks, i.e., something along the lines of:

@pytest.mark.parametrize(
    "task_name",
    [
        (task_name,) for task_name in sbibm.get_available_tasks()
    ],
)

What do you think?

psteinb commented 2 years ago

I think that this makes a lot of sense. I can add that if you want. I suggest to be a bit more selective though in terms of what is tested, in order not to extend the runtime of the tests. One way to go would perhaps be, to do these automated tests in /repo/tests/task as they mostly test the interface of the tests which is settled in /repo/sbibm/tasks/task.py. For tests potentially running longer, we could mirror the same structure as in /repo/sbibm/tasks/ and allow for specialized tests (e.g. see the.demo.test cases in this PR) in the/repo/tests/tasks/two_moons/test_special.py`. I think this keeps the balance between a "god test suite" and more specialized tests if they are needed. And I'd hope, it makes contributing easier.

jan-matthis commented 2 years ago

I think that this makes a lot of sense. I can add that if you want.

That would be great!

I suggest to be a bit more selective though in terms of what is tested, in order not to extend the runtime of the tests. One way to go would perhaps be, to do these automated tests in /repo/tests/task as they mostly test the interface of the tests which is settled in /repo/sbibm/tasks/task.py. For tests potentially running longer, we could mirror the same structure as in /repo/sbibm/tasks/ and allow for specialized tests (e.g. see the .demo.test cases in this PR) in the/repo/tests/tasks/two_moons/test_special.py`. I think this keeps the balance between a "god test suite" and more specialized tests if they are needed. And I'd hope, it makes contributing easier.

That's an excellent suggestion. I agree that, depending on the task, we will probably want specialized tests as well -- probably mostly marked as slow for CI execution.

psteinb commented 2 years ago

@jan-matthis done for now. I am happy to adapt #18 if this can be merged earlier. Please review whenever you can find the time.

psteinb commented 2 years ago

Thanks for the review. I hope I implemented those comments alright.

jan-matthis commented 2 years ago

Cheers!