Closed jacobmerson closed 2 years ago
Hi @jacobmerson,
Many thanks for reporting this. I will look into it and propose a fix soon.
Hi @jacobmerson ,
Can you please share more details about your setup, e.g. pytorch version, python version, any additional detail about your setup that may help reproduce this?
I have included macos-12 in my build workflow, and it seems to be passing unittests on this: https://github.com/masadcv/FastGeodis/runs/7418137766?check_suite_focus=true#step:7:1
Please note: skipping tests are only the ones that run on GPU, these are same as the ones on CPU but add checking functionality on GPU when available.
@masadcv I'm not too worried about the skipped tests because they are for the GPU.
Here is the packages installed in my venv
(geodis) ➜ FastGeodis git:(master) pip freeze
-e git+https://github.com/masadcv/FastGeodis@5914d7e93b1422c0bc9c9a6b69bd4c048ce7576a#egg=FastGeodis
kiwisolver==1.4.4
numpy==1.23.1
parameterized==0.8.1
Pillow==9.2.0
pyparsing==3.0.9
SimpleITK==2.1.1.2
six==1.16.0
torch==1.12.0
typing_extensions==4.3.0
This is with python 3.9.13
Many thanks @jacobmerson for providing further details about your setup.
Just to update you on this, I have added multiple regressions within the workflows to cover MacOS 12 with python 3.x (including python 3.9.13). These all pass without any issue, e.g. this ones
I am now trying to get hold of a machine where I can try to replicate your setup and get to what is causing the errors with unittests. Please bear with me, I will report back here once I have done this.
@masadcv thanks for adding these. The only thing I can think of is maybe that it's an arm machine? I have a linux box that I will also test on.
Many thanks @jacobmerson !
The only thing I can think of is maybe that it's an arm machine?
That might be it. The failing unittests are all looking for a ValueError
to be returned by code (due to ill shaped inputs). I think possibly here the compiler for arm may return a different error - which is not picked up as ValueError by python interpreter.
I have also got hold of a mac machine (with intel cpu) and run unittests successfully:
$ sw_vers
ProductName: macOS
ProductVersion: 12.3.1
BuildVersion: 21E258
$ pip3 list | grep FastGeodis
FastGeodis 1.0.0rc6 /Users/masad-mac/FastGeodis
$ python3 --version
Python 3.9.13
$ python3 -m unittest
cpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.sssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
Tensor1 Shape: (1, 1, 32, 12)
Tensor2 Shape: (1, 1, 32, 32)
.cpu
2
Tensor1 Shape: (1, 1, 128, 12)
Tensor2 Shape: (1, 1, 128, 128)
.cpu
2
Tensor1 Shape: (1, 1, 256, 12)
Tensor2 Shape: (1, 1, 256, 256)
.ssscpu
3
Tensor1 Shape: (1, 1, 16, 16, 12)
Tensor2 Shape: (1, 1, 16, 16, 16)
.cpu
3
Tensor1 Shape: (1, 1, 64, 64, 12)
Tensor2 Shape: (1, 1, 64, 64, 64)
.cpu
3
Tensor1 Shape: (1, 1, 128, 128, 12)
Tensor2 Shape: (1, 1, 128, 128, 128)
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.sssscpu
2
Tensor1 Shape: (1, 1, 32, 12)
Tensor2 Shape: (1, 1, 32, 32)
.cpu
2
Tensor1 Shape: (1, 1, 128, 12)
Tensor2 Shape: (1, 1, 128, 128)
.cpu
2
Tensor1 Shape: (1, 1, 256, 12)
Tensor2 Shape: (1, 1, 256, 256)
.ssscpu
3
Tensor1 Shape: (1, 1, 16, 16, 12)
Tensor2 Shape: (1, 1, 16, 16, 16)
.cpu
3
Tensor1 Shape: (1, 1, 64, 64, 12)
Tensor2 Shape: (1, 1, 64, 64, 64)
.cpu
3
Tensor1 Shape: (1, 1, 128, 128, 12)
Tensor2 Shape: (1, 1, 128, 128, 128)
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.ssscpu
2
.cpu
2
.cpu
2
.ssscpu
3
.cpu
3
.cpu
3
.sss
----------------------------------------------------------------------
Ran 134 tests in 29.178s
OK (skipped=68)
I have slightly tweaked the unittests to have lower memory requirement in PR #16 and added macos-12 regressions as mentioned above. Please feel free to reopen this in case the problem persists. Many thanks!
Your changes didn't seem to fix the issue and I think I have the problem more or less tracked down, but I have some gaps in my knowledge...
I am still getting the same test failures with RuntimeError: Caught an unknown exception!
. Putting try/catch
blocks around the code generalised_geodesic2d
I can see that the error exception that is thrown is a std::invalid_argument
exception and that seems to cause the unknown exception type error on the python side.
Based on the error message on the python side it looks like pybind11 doesn't know about this exception type see: https://github.com/pybind/pybind11/blob/b07975f492c2eed0409a18353fa23c9969e83e42/tests/test_exceptions.py#L152
The weird thing is that the pybind11 docs state that they can deal with std::invalid_argument
. See here.
Since I didn't see any explicit use of pybind11 I'm guessing the binding must be happening inside pytorch and possible something is going wrong there. So this error is less to do the the operating system/cpu arch and more related to the pytorch version?
@masadcv I can't seem to reopen the issue. Please see more details on the error above.
Many thanks @jacobmerson I have reopened this. Will have a closer look at it and get back to you.
The pybind11 is indeed handled by PyTorch C++ apis and in this case the tests are looking for code to return ValueError
which should correspond to std::invalid_argument
as you pointed out and indicated here: https://pybind11.readthedocs.io/en/stable/advanced/exceptions.html
I am still unsure why this is giving different exception on arm based macos, while all other OS are passing and catching the right ValueError
exception for these tests. I need to look a bit more into this to see if I can fix this for all cases.
I am closing this as this was fixed and tests added for relevant MACOS builds. Please feel free to reopen if you think otherwise
Describe the bug Many tests failing/skipped. I suspect the skipped tests are gpu tests since I'm running on a CPU only system at the moment.
To Reproduce
Expected behavior passing tests
Desktop (please complete the following information):