qijiezhao / pseudo-3d-pytorch

pytorch version of pseudo-3d-residual-networks(P-3D), pretrained model is supported
MIT License
450 stars 113 forks source link

Pre-trained model giving wrong results on Kinetics #14

Closed BKHMSI closed 6 years ago

BKHMSI commented 6 years ago

@qijiezhao I am using the RGB pre-trained model you provided p3d_rgb_199.checkpoint.pth.tar, but it's giving me wrong results for my input frames. I even tried testing it on a video from the training set of the Kinetics dataset, but it got also a wrong label. I am preprocessing the data as can be seen below, and I am using the labels in ascending order.

For instance, I tried a video of a person playing tennis and got those as my top 5 results with their corresponding scores:

7.55878  dancing ballet
5.82084  stretching arm
4.84113  side kick
4.80862  playing squash or racquetball
4.80691  exercising arm

Can you please confirm that the pre-trained model you provided is valid?

def get_clip(clip_name):
    clip = sorted(glob(join('data', clip_name, '*.png')))

    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
    preprocess = transforms.Compose([
        transforms.Resize((160, 160)),
        transforms.ToTensor(),
        normalize
    ])

    frames = []
    for frame in clip:
        image = Image.open(frame)
        image = preprocess(image)
        frames.append(image.unsqueeze(0))

    frames = torch.cat(frames, 0)

    clip = frames.permute(1, 0, 2, 3)  # ch, fr, h, w
    clip = clip.unsqueeze(0)
    return clip

def read_labels_from_file(filepath):
    with open(filepath, 'r') as f:
        labels = [line.strip() for line in f.readlines()]
    return labels

if __name__ == '__main__':

    model = P3D199(pretrained=True, num_classes=400)
    X = get_sport_clip('tennis')
    X = Variable(X)
    X = X.cuda()

    model.cuda()
    model.eval()
    prediction = model(X)
    prediction = prediction.data.cpu().numpy()

    # read labels
    labels = read_labels_from_file('kinetics_labels.txt')

    # print top predictions
    top_inds = prediction[0].argsort()[::-1][:5]  # reverse sort and take five largest items
    print('\nTop 5:')
    for i in top_inds:
        print('{:.5f} {} {}'.format(prediction[0][i], i, labels[i]))
SampsonKwan commented 6 years ago

@BKHMSI Have you solve this problem? I also got this problem. I use the video clips in different classes, but most of them are predicted to the same label. @qijiezhao Could you provide the preprocessing code if you don't mind?

BKHMSI commented 6 years ago

@SampsonKwan sadly I still wasn't able to solve the problem, please update here if you reached anything

zzy123abc commented 6 years ago

https://github.com/qijiezhao/pseudo-3d-pytorch/issues/11 If you directly use the weight,you may get the wrong answer.This is because the caffe model is different from the pytorch model.So you can finetune it on your own dataset,and then when you test it,it's ok.

BKHMSI commented 6 years ago

@zzy123abc Thank you for the clarification

qijiezhao commented 6 years ago

@BKHMSI @SampsonKwan please have a check of the updated code. For another, the new weights will be provided later.