silx-kit / h5grove

h5grove is a Python package that provides utilities to design backends serving HDF5 file content: attributes, metadata and data.
https://silx-kit.github.io/h5grove/
MIT License
14 stars 7 forks source link

Consider splitting/refactoring `base_test.py` #100

Open axelboc opened 2 months ago

axelboc commented 2 months ago

Very low priority, but I find that base_test.py is becoming quite long and is, generally speaking, difficult to read. I feel like we could make things clearer at least by:

Moreover, there are a lot of things to test:

That's a lot of features and a lot of permutations if we really want to be thorough. It might be worth creating a few big HDF5 files like I did in H5Web (sample.h5) to make sure that we test /data/ and /meta/ with as many permutations of dtype, shape, format, etc., as we can.

loichuder commented 2 months ago

I do agree that there is room for improvement. Notably, I always find a bit misleading that the real tests are in base_test while most test_... files are in fact fixtures.

And I always need to take 5mn to understand how the fixtures are fed in the base tests.

Perhaps levering pytest's parametrize could allow us to make things clearer.

splitting the HDF5 files creation logic out of the "base" test files.

Not sure about this one. I like that I can check what HDF5 file is tested with one glance on the test. Writing tests with big HDF5 files requires external knowledge about these files.

axelboc commented 2 months ago

Notably, I always find a bit misleading that the real tests are in basetest while most test... files are in fact fixtures.

Haha true.

Not sure about this one. I like that I can check what HDF5 file is tested with one glance on the test. Writing tests with big HDF5 files requires external knowledge about these files.

It's just a lot of repetition: we initialise a dataset path and numpy value, create a file, create the dataset, access it, decode it, and assert that the returned value matches the initial value exactly. If we had one big dictionary of initial values, we could just loop through them to make sure h5grove supports them all and reads them all correctly. This could even be a single test, to be honest - i.e. a smoke test.

axelboc commented 2 months ago

This smoke test would work particularly well for the /data/ endpoint with format=bin.

Generally speaking, I would prefer to have one test per format and for each test to loop through multiple datasets, rather than one test per dataset and for each test to loop through multiple formats (what we have currently).