xiaobai1217 / RepetitionCounting

Code for "Repetitive Activity Counting by Sight and Sound"
22 stars 7 forks source link

Reproducing results on visual checkpoint #23

Open DWhettam opened 1 year ago

DWhettam commented 1 year ago

Hi, I'm trying to reproduce the results in Table 7 of the paper - visual checkpoint only, manually specifying the sample rate, and I'm running into some issues. I have followed the suggestions in #13 and built a test script following the run_demo.py by looping over the files inside the test csv file, copying the exact process of run_demo.py but specifying a fixed sample rate. Currently my results are totally wrong (MAE 9.96). Do you have any suggestions for what I could be doing differently? Any help would really be appreciated!


test_df = pd.read_csv('countix_test_examples_clean.csv')

np.random.seed(0)
torch.manual_seed(0)
torch.backends.cudnn.enabled = False # 0.811

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#-----------------------------------------------load models trained on Countix-AV-------------------------------------------
model = models.video.r2plus1d_18(pretrained=True)
model.fc = torch.nn.Linear(512,34*41)
model.fc2 = torch.nn.Linear(512,41)
model = convert_model(model)

if device.type == "cuda":
    model = torch.nn.DataParallel(model)

model = model.cuda()
checkpoint = torch.load('visual_checkpoint.pt')
model.load_state_dict(checkpoint['state_dict'])
model.eval()

tensor = torch.Tensor(np.arange(2,36)).type(torch.FloatTensor).cuda().unsqueeze(0)

data_path = 'path/my_path'

outputs_list = []
groundtruth_lists = []

for idx, row in tqdm(test_df.iterrows(), total=test_df.shape[0]):
    video_id = row[0]
    gt = row[-1]
    countix_start = float(row[3])
    countix_end = float(row[4])
    video_path = f"{data_path}/{video_id}.mp4"
    video, fps = read_video(video_path)
    video = video.astype(np.float16)
    video = video/255.0
    video = (video - np.array([0.485, 0.456, 0.406], dtype=np.float16).reshape((1, 1, 1, 3))) / np.array([0.229, 0.224, 0.225],dtype=np.float16).reshape(
        (1, 1, 1, 3))
    start = int(countix_start * fps)
    end = int(countix_end * fps)
    video = video[start:end]
    outputs = get_visual_count(video, args.sample_rate, model, tensor)
    outputs_list.append(outputs)
    groundtruth_lists.append(gt)

obo = sum(
        [1 if (pred >= gt - 1) and (pred <= gt + 1) else 0 for pred, gt in zip(outputs_list, groundtruth_lists)]) / float(len(groundtruth_lists))

np_outputs = np.array(outputs_list)
np_labels = np.array(groundtruth_lists)
mae_err = np.mean(np.abs(np_labels - np_outputs))

print(f"OBO ACCURACY: {obo}")
print(f"MAE: {mae_err}")
guoxigan commented 1 year ago

Specifying a fixed sample rate is not a good ideal!

DWhettam commented 1 year ago

@guoxigan For sure - although I'm trying to replicate Table 7, which is with a fixed sample rate. I wanted to be able to test just the visual model to avoid training the full thing first. I have now got it working by implementing a testing script from scratch without using the util functions provided, although I'd be interested to know what the issue is with this version of the script

guoxigan commented 1 year ago

Your script looks correct, and I suspect you might be using a very small sampling rate. The smaller the sampling rate, the larger the mean absolute error (MAE).

DWhettam commented 1 year ago

As I said, I'm trying to reproduce Table 7, so I used the sampling rates there. Do you have your testing code available online anywhere @guoxigan? I'd be interested to see it, thanks!

guoxigan commented 1 year ago

I'm sorry, but after I failed to reproduce Table 7, I deleted the previous project files. I suggest you first assume that the stride selection model can predict the optimal stride with 100% accuracy. In other words, use the following code to calculate the stride: “sample_rate = int(max((video.shape[0]/count+2)/32,1))”

DWhettam commented 1 year ago

OK I'll have a play with it, thanks!