Assertion error in example.py

HeitorBoschirolli commented 5 years ago

Hi, So, as the title says, I got an assertion error running example.py. I printed out the value of

(features.cpu() - expected_features).abs().max()

and got 0.0010.

Versions of required libraries: matplotlib: 3.0.2 torch: 1.0.1 torchvision: 0.2.1

luizgh commented 5 years ago

Hi,

While the difference is not very big, this is quite strange. Did you run it on CPU or GPU? I obtained the "expected features" on a GTX1080 GPU, but running on my CPU I get a difference of 5e-06.

Can you please check the version of skimage that you are using? My first guess would be that the image loading pipeline is different. To help troubleshooting the problem, please compare the variable "input" that you get with what I obtain in my case: Please add a line torch.save(input, 'input.pth') after the preprocessing (after line 30), and compare to the results I obtained: input.zip

HeitorBoschirolli commented 5 years ago

Hi,

I ran it on a GTX 1050 TI GPU, running on the CPU gave me the same result (0.0010).

I'm using skimage version 0.14.1

Here are some things I got by comparing your (expected) input with my input (current). comp

HeitorBoschirolli commented 5 years ago

Will leave my input here in case you want to check it.

input.zip

luizgh commented 5 years ago

Ok, so there is a small change on the input, meaning that data pre-processing is slightly different. I am using skimage 0.14.2 btw.

If I understood it correctly, you also get different results running forward-prop with the input I gave you, is this right? In this case, can you please try running it on CPU to check? You can run export CUDA_VISIBLE_DEVICES= so that torch doesn't use any GPU.

Lastly, please note that such small differences should not make a big impact in performance, as long as you don't mix feature vectors obtained with different pipelines. However, I am still curious to find out what exactly is causing the different results.

HeitorBoschirolli commented 5 years ago

If I understood it correctly, you also get different results running forward-prop with the input I gave you, is this right?

No, using your input I got the correct results using both the CPU and the GPU. Therefore the forward propagation seems to be working correctly.

After updating scikit-image to 0.14.2 and printing out the value of

(features.cpu() - expected_features).abs().max()

I still got 0.0010

HeitorBoschirolli commented 5 years ago

Maybe the pytorch version affects

input = torch.from_numpy(processed).view(1, 1, 150, 220)

and

input = input.float().div(255).to(device)

What do you think? What version are you using?

luizgh commented 5 years ago

I am using torch version "1.0.1.post2". I think it is unlikely that torch is the problem. Most likely, skimage has some dependency in another package (e.g. to load the image, resize, etc), that we did not match. To confirm, I saved all intermediate steps: 1) original image loaded as numpy array; 2) preprocessed image; 3) converted to torch tensor; 4) after dividing by 255. The updated "example code" to save them is also attached:

intermediate_steps.zip

HeitorBoschirolli commented 5 years ago

Here's what I got: comparison

HeitorBoschirolli commented 5 years ago

I updated torch and the results remained the same.

luizgh commented 5 years ago

@HeitorBoschirolli , I just realized that the example above was not properly saving the preprocessed results (it was saving the original image again as "processed.npy'). Please compare again the "processed.npy" results.

example_new.zip

I don't think the conversion to torch should make any difference (it is not even copying the data, just creating a view to the same data that numpy is using). It is more likely that the preprocessing is giving different results (e.g. due to a dependency of skimage not being the same). I am attaching below the list of all packages I have installed in my conda environment, so you can double check the versions.

packages.txt

HeitorBoschirolli commented 5 years ago

I created a new environment and installed the packages one by one to match the ones on your environment. Here are all the packages installed in my conda environment.

my_packages.txt

Still have the assertion error.

The packages are almost the same, I printed out the diff between them

diff

HeitorBoschirolli commented 5 years ago

So, after restarting the terminal the problem was gone. Maybe some updates were not applied in the first test I did.

I could not spot what was the package causing the problem. I will try to replicate the error and run "example.py" after each update. As soon as I do that, I will post here what was the problematic package.

Thanks a lot for the help btw c:

luizgh commented 5 years ago

Ok, thanks for checking Heitor.

Again, please note that even if the pre-processing is not exactly the same, this should not be a problem in practice, as long as you always use the same pre-processing (e.g. for training and generalization). Anyway, if you find out which package gives you different results (depending on its version), please let me know - other people may run into the same problem.

HeitorBoschirolli commented 5 years ago

Could not reproduce the error after reinstalling anaconda. But using scikit-image 0.14.1 and numpy 1.16.2 the program would not execute. Updating scikit-image to 0.14.2 or downgrading numpy to 1.15.4 (or both) solved the problem.

I will close the issue because using the packages provided by you everything works.

Once again, thanks for the help

variux commented 5 years ago

I'm getting the same error

Using NVIDIA Tesla K80

When I print the value it prints tensor(4.5668)

Torch version: 1.3.0 Matplotlib version: 3.1.1 Torch vision: 0.4.1

luizgh / sigver

Assertion error in example.py #2