pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.27k stars 6.96k forks source link

Restructuring torchvision tests #6020

Open oke-aditya opened 2 years ago

oke-aditya commented 2 years ago

Motivation

Our tests are getting a bit lengthy and considering we are adding new features, we are actually placing multiple tests in single file. This leads to problems as the tests become harder to refactor and harder to read, manage. Standalone files for tests or grouped according to purpose are easier to anaylse and debug as well.

Our longest test files

Copy paste-able command

find test -name "*.py" | xargs -I {} wc -l {} | sort -nr | head -n 10
oke@psycon:~/Aditya/PyTorch/vision$ find test -name "*.py" | xargs -I {} wc -l {} | sort -nr | head -n 10
2675 test/test_datasets.py
2242 test/test_transforms.py
1535 test/builtin_dataset_mocks.py
1528 test/test_ops.py
1343 test/test_functional_tensor.py
1339 test/test_prototype_transforms_functional.py
1254 test/test_video_reader.py
982 test/test_transforms_tensor.py
954 test/datasets_utils.py
942 test/test_models.py

Many splits are possible that can make the tests smaller and easier to manage

E.g. 

test_models.py
-----  test_classification_models.py
----- test_detection_models.py
----- test_video_models.py

test_ops.py
------ test_post_processors.py
------ test_losses.py
------ test_layers.py

Possibilities are many, we can have multiple files with better grouping.

Pitch

We should think of grouping tests in logically similar way together. or consider creating test folder a package. See https://github.com/pytorch/vision/pull/4436 . Earlier we have faced problems doing the same.

From discussion with Phillip offline Considering we don't Pin pytest. We can make use of https://github.com/pytest-dev/pytest/pull/9134

With the python path param, we can probably make use of common_utils.py file as well. So that we can test files without hurting other scenarios which were mentioned in the comments.

Alternatives

This is quite tricky but very helpful as we don't want tests to hurt us later.

Reference

https://github.com/pytorch/vision/pull/4436#issuecomment-923131237

https://github.com/pytorch/vision/pull/4436#issuecomment-924005657

https://github.com/pytorch/vision/pull/4436#issuecomment-924250093

cc @NicolasHug @datumbox @pmeier

NicolasHug commented 2 years ago

Thanks for opening this issue @oke-aditya

There are a few more things to consider, as locs don't give a full picture of a file's complexity.

Some of our tests files are divided into test classes, so long files are less of a problem in these ones. It's also worth noting that having all ops-related tests in a single test_ops.py file makes (human) test discovery very, very easy. Splitting tests across multiple files might change that, especially considering we don't have a proper sub-folder structure in our test dir. On top of that, most current "common utils" can be inlined in a single test file for now, but if we were to split files, we'd have to dispatch those into specific files - this too can be a bit messy without a decent folder structure.

oke-aditya commented 2 years ago

Yes the idea is to get a proper sub folder structure in tests. Also the tests for now can be placed in sub folders but under a single file. We could have ops folder with single test file test_ops.py (Eventually when time arises splitting it becomes easy)

tests
---- datasets
---- ops
------- test_ops.py

There are a few more things to consider, as locs don't give a full picture of a file's complexity.

Agreed. But sadly our editors and GitHub indexing etc won't agree :( I think the limit is 10k lines though, so we are pretty far.

NicolasHug commented 2 years ago

Another problem of moving tests is breaking git blame

oke-aditya commented 2 years ago

You actually had solved it :smile: https://github.com/pytorch/vision/blob/main/.git-blame-ignore-revs to the rescue?

NicolasHug commented 2 years ago

I don't know if ignore-revs can be used for this? Git will have to know somehow that line a on file F corresponds to line b on file G. It can do that with entire file renamings/moving but I doubt it's possible for arbitrary content?

oke-aditya commented 2 years ago

Maybe, this is experimental. My suggestion is let's actually try. Maybe on experimenting we will realise if this is really beneficial. Otherwise we would have a strong point to not go for this. As always there are advantages and disadvantages with anything we try :)

pmeier commented 2 years ago

I agree with at @NicolasHug here. I don't think git blame will pick up on this.

My suggestion is let's actually try. Maybe on experimenting we will realise if this is really beneficial.

Give it a try in your fork let us know here so we can have a look.