layumi / Person_reID_baseline_pytorch

:bouncing_ball_person: Pytorch ReID: A tiny, friendly, strong pytorch implement of person re-id / vehicle re-id baseline. Tutorial 👉https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial
https://www.zdzheng.xyz
MIT License
4.13k stars 1.01k forks source link

Tiled Image input to the Neural Network #321

Closed Varghese-Kuruvilla closed 2 years ago

Varghese-Kuruvilla commented 2 years ago

Hi Layumi, Thanks a lot for the effort that you have put into this repository. It is really helpful for me. I have a couple of issues that I wanted to clarify

  1. I tried to retrain ft_net with the Market-1501 dataset. When I visualise each image from the input tensor passed to the neural network, I get a tiled image of the same person. I have attached a screenshot of the same below. Is this correct? I am sure that I am missing something, I just am not able to figure out what.

  2. The codebase seems to be computing the classification loss + triplet loss. However, the paper 'In defense of Triplet Loss for Person Reidentification' seems to advocate using the triplet loss alone. Could you comment on this?

To Reproduce Steps to reproduce the behavior:

  1. Download Market-1501 dataset.
  2. Data preparation with prepare.py
  3. Run train.py with triplet Loss.
  4. Visualize input tensor with the following code snippet:

                if phase == 'val':
                    with torch.no_grad():
                        # print("inputs.size():",inputs.size())
                        outputs = model(inputs)
                        # print("outputs.size():",outputs.size())
                        breakpoint()
                else:
                    # print("inputs.size():",inputs.size())
                    #Vizualize images that are input to the network
                    for input in inputs:
                        input = np.array(input.cpu())
                        input = input.reshape(input.shape[1],input.shape[2],input.shape[0])
                        display_image("Image",input)
    
                    outputs = model(inputs)

Screenshots Market-1501_Input_Image

Thanks a lot for your help in advance!

layumi commented 2 years ago

Hi @Varghese-Kuruvilla After normalization (in data augmentation part), the value of the input tensor is among [-2, 2]. If you want to visualise the tensor, you need to reverse the normalisation.

Varghese-Kuruvilla commented 2 years ago

Hi @layumi , Thanks for your quick response. However, I wanted to know why a single image contains 9 tiled images of the same person (as shown in the screenshot) . I would expect it to be a single image right? Thanks in advance!

Varghese-Kuruvilla commented 2 years ago

Hi @layumi , Sorry my bad. Looks like numpy.reshape isn't the right way to visualise the tensor. Using the permute function from pytorch gave me the correct result. Would you mind commenting on the second point in my initial post? Thanks a lot!

layumi commented 2 years ago

Hi @Varghese-Kuruvilla

Actually, I found that the cross entropy loss is stable and easy to train in most cases. So generally I will keep cross entropy in most my attempts.

You may refer to my another paper on such phenomenon https://arxiv.org/abs/1711.05535

Varghese-Kuruvilla commented 2 years ago

Thanks @layumi. Closing the issue.