TST: Test fails locally [pyOpenSci review]

NickleDave commented 8 months ago

This test failed for me locally: tests/test_metamers.py::TestMetamers::test_stop_criterion

I need to reboot and re-run so I can give you the full log from the pytest run, will do so after I finish the comment with my entire review :slightly_smiling_face:

billbrod commented 8 months ago

When you do so, can you please also include your system info (as requested in our bug report issue template)? (These tests are passing in our CI, so will need to try and track it down)

System (please complete the following information):

OS: [e.g. Mac (with version), Ubuntu 18.04]
Python version [e.g. 3.7]
Pytorch version [e.g., 1.4]
Plenoptic version [e.g. 0.1]

NickleDave commented 8 months ago

OS: Linux, PopOS 22.04 LTS (Ubuntu-like) Python version: 3.10.12 Pytorch version: 2.1.2+cu121 Plenoptic version: 1.0.3.dev18

I could also reply with a dump of the virtual environment to a requirements.txt if that would help.

=========================================================== FAILURES ============================================================
____________________________________ TestMetamers.test_stop_criterion[frontend.OnOff.nograd] ____________________________________

self = <test_metamers.TestMetamers object at 0x7effcf2684c0>
einstein_img = tensor([[[[0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039],
          [0.0039, 0.1961, 0.1961,  ..., 0.6039, 0.6..., 0.1725, 0.1922, 0.0039],
          [0.0039, 0.0039, 0.0039,  ..., 0.0039, 0.0039, 0.0039]]]],
       device='cuda:0')
model = OnOff(
  (center_surround): CenterSurround()
  (luminance): Gaussian()
  (contrast): Gaussian()
)

    @pytest.mark.parametrize('model', ['frontend.OnOff.nograd'], indirect=True)
    def test_stop_criterion(self, einstein_img, model):
        # checking that this hits the criterion and stops early, so set seed
        # for reproducibility
        po.tools.set_seed(0)
        met = po.synth.Metamer(einstein_img, model)
        # takes different numbers of iter to converge on GPU and CPU
        if DEVICE.type == 'cuda':
            max_iter = 30
            check_iter = 25
        else:
            max_iter = 10
            check_iter = 8
        met.synthesize(max_iter=max_iter, stop_criterion=1e-5, stop_iters_to_check=5)
>       assert len(met.losses) == check_iter, "Didn't stop when hit criterion! (or optimization changed)"
E       AssertionError: Didn't stop when hit criterion! (or optimization changed)
E       assert 14 == 25
E        +  where 14 = len(tensor([0.0002, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002,\n        0.0002, 0.0002, 0.0002, 0.0002, 0.0002]))
E        +    where tensor([0.0002, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002,\n        0.0002, 0.0002, 0.0002, 0.0002, 0.0002]) = <plenoptic.synthesize.metamer.Metamer object at 0x7eff807fe890>.losses
tests/test_metamers.py:235: AssertionError
----------------------------------------------------- Captured stderr call ------------------------------------------------------
  0%|          | 0/30 [00:00<?, ?it/s, loss=1.5279e-04, learning_rate=0.01, gradient_norm=6.8560e-06, pixel_change_norm=1.5941e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=3.5352e-04, learning_rate=0.01, gradient_norm=9.1781e-03, pixel_change_norm=1.5810e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=3.4398e-04, learning_rate=0.01, gradient_norm=9.1742e-03, pixel_change_norm=1.5639e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=3.4316e-04, learning_rate=0.01, gradient_norm=9.3476e-03, pixel_change_norm=1.5441e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=2.9748e-04, learning_rate=0.01, gradient_norm=8.5114e-03, pixel_change_norm=1.5221e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=3.1232e-04, learning_rate=0.01, gradient_norm=9.0283e-03, pixel_change_norm=1.4975e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=2.7997e-04, learning_rate=0.01, gradient_norm=8.4553e-03, pixel_change_norm=1.4708e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=2.4384e-04, learning_rate=0.01, gradient_norm=7.7305e-03, pixel_change_norm=1.4420e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=2.1998e-04, learning_rate=0.01, gradient_norm=7.2615e-03, pixel_change_norm=1.4114e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=1.9966e-04, learning_rate=0.01, gradient_norm=6.8502e-03, pixel_change_norm=1.3789e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=2.0742e-04, learning_rate=0.01, gradient_norm=7.2220e-03, pixel_change_norm=1.3449e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=2.1823e-04, learning_rate=0.01, gradient_norm=7.6441e-03, pixel_change_norm=1.3096e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=1.9260e-04, learning_rate=0.01, gradient_norm=7.0683e-03, pixel_change_norm=1.2733e+0  0%|          | 0/30 [00:00<?, ?it/s, loss=1.9586e-04, learning_rate=0.01, gradient_norm=7.2721e-03, pixel_change_norm=1.2363e+0 43%|████▎     | 13/30 [00:00<00:00, 151.32it/s, loss=1.9586e-04, learning_rate=0.01, gradient_norm=7.2721e-03, pixel_change_norm=1.2363e+00]

billbrod commented 7 months ago

Hi @NickleDave, just getting around to this now. Can you give me a full dump of the virtual environment?

However, I think this is an issue with setting the seed, which determines the initial conditions for the optimization (most importantly, the patch of white noise that we use as the initial image). Given that, can you run:

import plenoptic as po
import torch
import imageio

po.tools.set_seed(0)
# from the data directory for now
einstein = po.load_images('data/256/einstein.png')
mdl = po.simul.OnOff((31, 31), pretrained=True, cache_filt=True)
po.tools.remove_grad(mdl)
met = po.synth.Metamer(einstein, mdl)
imageio.imsave('init.png', (255 * po.to_numpy(met.metamer).squeeze()).astype(np.uint8))

and then upload the resulting init.png file to this issue? that way I can double check whether it's the same as the one I get.

If they're different, then I can try uploading the version I get and seeing if using that in the tests fixes the problem.

I'd also be open to other alternatives for this test -- basically, I'm checking that, for a given set of initial conditions, metamer.synthesize hits the convergence condition at the right time.

billbrod commented 7 months ago

Alright, based on testing with @BalzaniEdoardo 's Mac, I'm 99% sure this is a result of getting a different sample of white noise during initialization.

My proposed fix is to actually check against the stop criterion, which makes more sense anyway.

NickleDave commented 7 months ago

Sorry for the slow reply

Not sure I follow all the logic of the test but I agree actually checking against the stop criterion makes more sense

billbrod commented 7 months ago

Can you check with main ? I just merged in #251 , which should solve this issue.

billbrod commented 1 month ago

@NickleDave can you try running the tests again? (with the main branch, haven't released the fix to pip yet)

NickleDave commented 5 days ago

Hi @billbrod sorry for the really slow reply on this

I don't have access to the same Ubuntu machine anymore.
I set up for dev on a MBP M3 and I was able to get this single test to pass. I think we can call it good.

Not relevant for this issue but note I had to comment out the line

addopts = "--cov=plenoptic -n auto"

or otherwise I got the error

ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --cov=plenoptic -n
  inifile: /Users/davidnicholson/Documents/repos/opensci/plenoptic/pyproject.toml
  rootdir: /Users/davidnicholson/Documents/repos/opensci/plenoptic

Not sure if that's operator error.

Also not a big deal but you might want to change the snippet in the dev installation instructions here to say pip install -e ".[dev]" since stuff like pytest won't work without it. I know you do explain this below in "optional dependencies" but maybe worth making it clear in this snippet that your using the "extras" mechanism to declare your dev dependencies

billbrod commented 4 days ago

Thanks!

Did you run pip install -e ".[dev]" before getting that error? This is the error you get if pytest is installed but pytest-cov is not. Can you check whether that's installed and, if not, install it and run again?

(And that's a good point about the dev instructions, I'll make that change)

NickleDave commented 4 days ago

Did you run pip install -e ".[dev]" before getting that error? This is the error you get if pytest is installed but pytest-cov is not. Can you check whether that's installed and, if not, install it and run again?

Ah, I see, I had run pip install -e .[test] -- force of habit from my own stuff. So, yes, operator error.

Not to complicate your life any further but jic you care, you can recursively define optional dependencies so that pip install -e .[dev] gives you doc as well. I usually just lump what you have in nb into doc because the notebooks are part of docs (IIUC), and have tests as another set of dependencies, then make those both "dev" since I usually want to both run tests and build docs in my dev environment

billbrod commented 4 days ago

Oh interesting, it might make sense to rename my current dev to tests and then add a new dev bundle that includes all the others.

If I understand that link correctly, I'd add something like:

dev = [
    "plenoptic[docs,nb,test]",
]

which will grab the version currently on PyPI, right? Ideally, it would grab the contents of those bundles from the same file, in case main adds an additional requirement that isn't yet included in a release. I'm about to change how my docs are built, so this will be relevant soon, but it might not matter long-term, as things stabilize again.

(docs doesn't require nb, actually; nbsphinx can handle the conversion of the ipynb files without jupyter. readthedocs just uses the docs bundle, not nb.)

billbrod commented 4 days ago

Oh wait, I spoke too soon -- it looks like it is grabbing the bundle from the current file (tested by incrementing the version required of a package in docs). Alright, that's good to know.

NickleDave commented 4 days ago

Yeah it's a bit confusing. IIUC since you will already have (the local version of) your package installed, the dependency resolver won't go get it off PyPI, and instead will just get the optional dependencies that are declared recursively

See https://discuss.python.org/t/where-is-nested-recursive-optional-dependencies-documented/35648/7 and GitHub issue linked therein

billbrod commented 1 day ago

I think for now that I won't use the "recurisve optional dependencies" like this, since that Github issue hasn't been closed. If I understand correctly, it's because of they're not testing this behavior right now, so it seems like a bad idea to rely upon it. If many folks have this confusion in the future, I may revisit this decision.

In that case, I'm going to close this with the merging of #294. (The actual test fix happened in #251)

plenoptic-org / plenoptic

TST: Test fails locally [pyOpenSci review] #238