iterative / dvc

🦉 ML Experiments and Data Management with Git
https://dvc.org
Apache License 2.0
13.36k stars 1.16k forks source link

test_artifacts_download_studio pollutes the test environment #10460

Open georgegeddes opened 2 weeks ago

georgegeddes commented 2 weeks ago

Bug Report

Description

When I cloned the repo and ran the tests in my local environment, I had some failures but the tests were passing in CI.

FAILED tests/integration/test_studio_live_experiments.py::test_post_to_studio[None-False-True] - AssertionError: assert 'mytoken' == 'STUDIO_TOKEN'
FAILED tests/integration/test_studio_live_experiments.py::test_post_to_studio[None-False-False] - AssertionError: assert 'mytoken' == 'STUDIO_TOKEN'
FAILED tests/integration/test_studio_live_experiments.py::test_post_to_studio[DVC_EXP_GIT_REMOTE-False-True] - AssertionError: assert 'mytoken' == 'STUDIO_TOKEN'
FAILED tests/integration/test_studio_live_experiments.py::test_post_to_studio[DVC_EXP_GIT_REMOTE-False-False] - AssertionError: assert 'mytoken' == 'STUDIO_TOKEN'

I ran detect-test-pollution and identified that tests/func/artifacts/test_artifacts.py::test_artifacts_download_studio is responsible for the bug.

https://github.com/iterative/dvc/blob/93c3100badbe1e7d41588872b1d5cb8834d04415/tests/func/artifacts/test_artifacts.py#L286-L292

Reproduce

I'm not sure how to reproduce the failure of pytest directly since the bug is caused by the order the tests happen to run in, but here is how to find the polluting test:

$ detect-test-pollution --failing-test tests/integration/test_studio_live_experiments.py::test_post_to_studio[None-False-True] --tests ./tests/
discovering all tests...
-> discovered 2792 tests!
ensuring test passes by itself...
-> OK!
ensuring test fails with test group...
-> OK!
running step 1:
- 2791 tests remaining (about 12 steps)
running step 2:
- 1395 tests remaining (about 11 steps)
running step 3:
- 697 tests remaining (about 10 steps)
running step 4:
- 348 tests remaining (about 9 steps)
running step 5:
- 174 tests remaining (about 8 steps)
running step 6:
- 87 tests remaining (about 7 steps)
running step 7:
- 43 tests remaining (about 6 steps)
running step 8:
- 21 tests remaining (about 5 steps)
running step 9:
- 10 tests remaining (about 4 steps)
running step 10:
- 5 tests remaining (about 3 steps)
running step 11:
- 3 tests remaining (about 2 steps)
running step 12:
- 2 tests remaining (about 1 steps)
double checking we found it...
-> the polluting test is: tests/func/artifacts/test_artifacts.py::test_artifacts_download_studio

Expected

All tests should pass regardless of test order.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.51.3.dev2+g93c3100ba
-----------------------------------
Platform: Python 3.10.12 on Linux-6.5.0-1022-oem-x86_64-with-glibc2.35
Subprojects:
    dvc_data = 3.15.1
    dvc_objects = 5.1.0
    dvc_render = 1.0.2
    dvc_task = 0.4.0
    scmrepo = 3.3.5
Supports:
    azure (adlfs = 2024.4.1, knack = 0.11.0, azure-identity = 1.16.1),
    gdrive (pydrive2 = 1.19.0),
    gs (gcsfs = 2024.6.0),
    hdfs (fsspec = 2024.6.0, pyarrow = 16.1.0),
    http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
    https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
    oss (ossfs = 2023.12.0),
    s3 (s3fs = 2024.6.0, boto3 = 1.34.106),
    ssh (sshfs = 2024.6.0),
    webdav (webdav4 = 0.9.8),
    webdavs (webdav4 = 0.9.8),
    webhdfs (fsspec = 2024.6.0)
Config:
    Global: /home/george/.config/dvc
    System: /etc/xdg/xdg-ubuntu/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/vgubuntu-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/5d7d637e82d7dc3f80170a9d6085c556

Additional Information (if any):

shcheklein commented 2 weeks ago

Can it be that it's reading the token from your local env (tests are not isolated properly from DVC config changes)?