Open antoche opened 1 year ago
Hey @antoche,
Thanks for spotting this, most of those should be fixed by now. Think all of them were because of precision and OOM problems
I've just tried on the 0.14.0 tag and I am still getting the same failures. Note they're not OOM errors.
Yes with PyTorch 2.0 being relased we got a couple new failures - I'll need to spend a day fixing all of those soon! But seems like they are all minor precision errors.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Just keeping this alive as this issue is still relevant and the suggestions are still valid IMO. Having a view into and access to a CI system would be a great help arleady.
Hey @antoche,
Could you open a PR with your suggested improvements?
Note: 1.) All tests run on GPU but the latents are created on CPU because it improves precision. 2.) We can only due so much regarding small precision errors due to: https://huggingface.co/docs/diffusers/main/en/using-diffusers/reproducibility#create-reproducible-pipelines 3.)
Those tests output are hard to read. I would recommend moving to torch.testing.assert_allcloseor np.testing.assert_allclose, which are precisely designed for these types of tests.
TBH I wouldn't necessarily agree here, I like it if error messages are very in-detail
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
We are going to try to review the tests soon, so removing stale
label.
Hi, just wanted to update this ticket to mention that the test_stable_diffusion_depth
tests are now passing, but test_models_unet_2d_condition.py::UNet2DConditionModelTests::test_lora_xformers_on_off
is still failing, and now I'm also seeing a simliar failure with test_models_unet_3d_condition.py::UNet3DConditionModelTests::test_lora_xformers_on_off
, with a very large error:
> assert (sample - on_sample).abs().max() < 1e-4
E AssertionError: assert tensor(0.5566, device='cuda:0') < 0.0001
E + where tensor(0.5566, device='cuda:0') = <built-in method max of Tensor object at 0x7f619cf12720>()
E + where <built-in method max of Tensor object at 0x7f619cf12720> = tensor([[[[[2.3253e-04, 1.5353e-03, 6.9697e-04, ..., 1.9233e-04,\n 6.2592e-05, 1.4339e-04],\n [1.0... [1.1253e-04, 1.3334e-04, 4.8172e-04, ..., 1.6327e-03,\n 4.1591e-04, 1.6979e-03]]]]], device='cuda:0').max
E + where tensor([[[[[2.3253e-04, 1.5353e-03, 6.9697e-04, ..., 1.9233e-04,\n 6.2592e-05, 1.4339e-04],\n [1.0... [1.1253e-04, 1.3334e-04, 4.8172e-04, ..., 1.6327e-03,\n 4.1591e-04, 1.6979e-03]]]]], device='cuda:0') = <built-in method abs of Tensor object at 0x7f619f7b5d60>()
E + where <built-in method abs of Tensor object at 0x7f619f7b5d60> = (tensor([[[[[ 1.1231e-01, 4.4056e-02, -1.6904e-02, ..., 5.2370e-02,\n 2.0205e-02, 1.8632e-01],\n ... [-3.4701e-01, 2.9261e-01, -5.1616e-01, ..., 6.4207e-02,\n -8.3352e-02, -3.3661e-01]]]]], device='cuda:0') - tensor([[[[[ 1.1208e-01, 4.5591e-02, -1.7601e-02, ..., 5.2562e-02,\n 2.0268e-02, 1.8646e-01],\n ... [-3.4712e-01, 2.9247e-01, -5.1568e-01, ..., 6.2574e-02,\n -8.3768e-02, -3.3491e-01]]]]], device='cuda:0')).abs
This was with diffusers-0.16.1 and xformers-0.0.20.
Regarding the third point, my comment is not about removing detail, but about increasing readability (and therefore maintainability and ease of contribution).
For example when replacing the assertion above with torch.testing.assert_close
, here's the resulting failure message:
> torch.testing.assert_close(sample, on_sample, rtol=0, atol=1e-4)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 58082 / 65536 (88.6%)
E Greatest absolute difference: 0.2881329655647278 at index (1, 2, 3, 17, 28) (up to 0.0001 allowed)
E Greatest relative difference: 789.9036144578313 at index (2, 3, 3, 12, 31) (up to 0 allowed)
It is not only much more readable, but actually gives more information than the original check.
torch.testing.assert_close
indeed looks nice!
Describe the bug
The following tests are failing on my system due to pipeline output differences:
Example output:
(here the error is right on the threshold)
Another example:
(here the error is way above the threshold)
Those failures are hard to troubleshoot because there is no clear reason as to what might be causing them. I have tried running those tests on different systems to find out whether they might be coming from hardware differences, but this is time consuming and not necessarily solving anything.
Some things I am noticing which I think could be improved:
torch.testing.assert_allclose
ornp.testing.assert_allclose
, which are precisely designed for these types of tests.Reproduction
Run tests as normal.
Logs
System Info
Tried on multiple linux machines with various Nvidia GPUs.
Running from branch v0.13.1
diffusers version: 0.13.1 Platform: Linux-4.14.240-weta-20210804-x86_64-with-glibc2.27 Python version: 3.9.10 PyTorch version (GPU?): 1.12.0a0+git664058f (True) Huggingface_hub version: 0.11.1 Transformers version: 4.26.0 Accelerate version: 0.13.1 xFormers version: 0.0.14.dev