PyTorch - Localization option is not available?

mrao commented 5 years ago

Hi Gunnar,

I've gone thru your charades repo and able to run RGB & Flow validation to recognize top 5 activities. Now I've moved to temporal fields repo and wish to identify the start and end of a particular action.

I'm using PyTorch version in my experiments. Unfortunately, I did not find option that generated localize*.txt files in PyTorch code base. Could you please guide me on this?

Thanks, Mohana Rao.

gsig commented 5 years ago

Hi Mohana,

For simplicity, this was omitted in the PyTorch codebase. It should be straight forward to generate the localize files by editing the validate_video function in train.py as follows, (untested):

    def submission_file_localize(ids, outputs, filename):
        """ write list of ids and outputs to filename"""
        with open(filename, 'w') as f:
            for vid, output_localize in zip(ids, outputs):
                for i, output in enumerate(output_localize):
                    scores = ['{:g}'.format(x) for x in output]
                    f.write('{} {} {}\n'.format(vid, i, ' '.join(scores)))

    def validate_video(self, loader, model, criterion, epoch, args):
        """ Run video-level validation on the Charades test set"""
        with torch.no_grad():
            batch_time = AverageMeter()
            outputs = []
            outputs_localize = []
            gts = []
            ids = []

            model.eval()
            criterion.eval()

            end = time.time()
            for i, (input, target, meta) in enumerate(loader):
                gc.collect()
                meta['epoch'] = epoch
                target = target.long().cuda(async=True)
                assert target[0,:].eq(target[1,:]).all(), "val_video not synced"
                input_var = torch.autograd.Variable(input.cuda(), volatile=True)
                target_var = torch.autograd.Variable(target, volatile=True)
                output = model(input_var)
                output, loss = criterion(*(output + (target_var, meta)), synchronous=True)

                # store predictions
                #output_video = output.mean(dim=0)
                output_video = output.max(dim=0)[0]
                outputs.append(output_video.data.cpu().numpy())
                outputs_localize.append(output.data.cpu().numpy())
                gts.append(target[0,:])
                ids.append(meta['id'][0])
                batch_time.update(time.time() - end)
                end = time.time()

                if i % args.print_freq == 0:
                    print('Test2: [{0}/{1}]\t'
                          'Time {batch_time.val:.3f} ({batch_time.avg:.3f})'.format(
                              i, len(loader), batch_time=batch_time))
            #mAP, _, ap = map.map(np.vstack(outputs), np.vstack(gts))
            mAP, _, ap = map.charades_map(np.vstack(outputs), np.vstack(gts))
            prec1, prec5 = accuracy(torch.Tensor(np.vstack(outputs)), torch.Tensor(np.vstack(gts)), topk=(1, 5))
            print(ap)
            print(' * mAP {:.3f}'.format(mAP))
            print(' * prec1 {:.3f} * prec5 {:.3f}'.format(prec1[0], prec5[0]))
            submission_file(
                ids, outputs, '{}/epoch_{:03d}.txt'.format(args.cache, epoch+1))
            submission_file_localize(
                ids, outputs_localize, '{}/localize_{:03d}.txt'.format(args.cache, epoch+1))
            return mAP, prec1[0], prec5[0]

Hope that helps!

mrao commented 5 years ago

Thank you very much Gunnar, I'll try this and let you know!

mrao commented 5 years ago

Sorry Gunnar for the delay, Perfect code, I'm able to generate localize*.txt files with 50 entries per video. I've not understood how to interpret them yet. May be we can close this issue and if I have additional questions, I'll open another ticket.

gsig commented 5 years ago

I think the default setting for the Charades_v1_localize.m is 25 frames, so you would have to edit the code to have 50 frames, or somehow reduce the 50 frames to 25 frames. The Charades README contains some more information about how localization is evaluated https://allenai.org/plato/charades/README.txt

########################################################### Charades_v1_localize.m ########################################################### Evaluation code for frame-level classification (localization). Each frame in a video has zero or more actions. This script takes in a "submission file" which is a csv file of the form:

id framenumber vector

where 'id' is a video id for a given video, 'framenumber' is the number of frame described below, and 'vector' is a whitespace delimited list of 157 floating point numbers representing the scores of each action in a frame. An example submission file is provided in test_submission_localize.txt (download this file with get_test_submission_localize.sh).

To avoid extremely large submission files, the evaluation script evaluates mAP on 25 equally spaced frames throughout each video. The frames are chosen as follows

for j=1:frames_per_video timepoint(j) = (j-1)*time/frames_per_video;

That is: 0, time/25, 2time/25, ..., 24time/25.

The baseline performance was generated by calculating the action scores at 75 equally spaced frames in the video (our batchsize) and picking every third prediction.

Let me know if you have any questions!

mrao commented 5 years ago

Thank you very much Gunnar, As I don't have matlab compatible setup, I converted part of logic from Charades_v1_localize.m into pyscript.

Currently, my script is generating below output based on 50 frame prediction. Value is the max probability out of 157 classes for that particular frame. Is this output in align with Charades_v1_localize.m concept?

I've not implemented validation part yet, so mAP computation is not available.

Video - 6IOV0, Frame - 0 @ 0.0 sec, Action - 156, value - 0.358724 Video - 6IOV0, Frame - 1 @ 0.61 sec, Action - 156, value - 0.365727 Video - 6IOV0, Frame - 2 @ 1.22 sec, Action - 156, value - 0.393653 Video - 6IOV0, Frame - 3 @ 1.83 sec, Action - 156, value - 0.380545 Video - 6IOV0, Frame - 4 @ 2.44 sec, Action - 156, value - 0.385359 Video - 6IOV0, Frame - 5 @ 3.05 sec, Action - 156, value - 0.361539 Video - 6IOV0, Frame - 6 @ 3.66 sec, Action - 54, value - 0.299614 Video - 6IOV0, Frame - 7 @ 4.27 sec, Action - 54, value - 0.29228 Video - 6IOV0, Frame - 8 @ 4.88 sec, Action - 54, value - 0.312875 Video - 6IOV0, Frame - 9 @ 5.49 sec, Action - 156, value - 0.378728 Video - 6IOV0, Frame - 10 @ 6.1 sec, Action - 156, value - 0.404691 Video - 6IOV0, Frame - 11 @ 6.71 sec, Action - 156, value - 0.407318 Video - 6IOV0, Frame - 12 @ 7.32 sec, Action - 156, value - 0.412708 Video - 6IOV0, Frame - 13 @ 7.93 sec, Action - 156, value - 0.424009

gsig commented 5 years ago

This looks reasonable to me. Charades_v1_localize.m just creates 25 ground truth labels for each video. A quick way of doing this in the codebase, is to just do:

                for timepoint in range(50):
                    outputs.append(output[timepoint, :].data.cpu().numpy())
                    gts.append(target[timepoint,:])

You might want to create new variables for outputs and gts so it doesn't break the later code. Also in the action-for-action codebase, there is a python version of the Charades_v1_localize code: https://github.com/gsig/actions-for-actions/tree/master/tool

Also, I wanted to let you know that I just made public PyVideoResearch, a new and improved codebase for video algorithms, including an improved version of temoral-fields (with I3D backbone). https://github.com/gsig/PyVideoResearch

mrao commented 5 years ago

Thanks Gunnar for checking my output, I forked the PyVideoResearch repo, will explore the code and try to understand (I started on Deep Learning recently, honestly speaking, I've not understood CRF & Message Passing yet). Thanks for the reference to actions-for-actions. I'll experiment with temporal-fields for few more days working along the lines you have suggested.

gsig / temporal-fields

PyTorch - Localization option is not available? #8