LTMU-H reproduction not performing as well as reported?

vineetparikh commented 5 months ago

Hi there, thanks so much for the great work and toolkit for future benchmarks!

I'm running the LTMU-H baseline for TREK-150 under the OPE protocol to get some initial understanding of quantitative performance, and I'm finding that SS, NPS, and GSS are significantly lower than what's reported in the paper. I've posted my values below.

I followed the initial guidelines, so my initial thought is that there's something different between my setup and the setup used to run evaluation. Any idea as to what's going on?

vineetparikh commented 5 months ago

for context, i'm using the same checkpoints for HOI and STARK as listed in the readme, so I don't know if there's any additional training that needs to be done/if there's another checkpoint that gives the results in the paper

matteo-dunnhofer commented 5 months ago

Hi @vineetparikh,

that's strange. I tested the repo multiple times and always had the correct results. No additional training or checkpoint rather than those posted in the README are needed. Maybe something is wrong with frames and annotations? Did you try to run the report method on the precomputed results we provide?

vineetparikh commented 5 months ago

Yup, I pre-extracted the frames with the same ffmpeg version and visualized them to make sure the annotations looked good (actually had opened another issue at https://github.com/matteo-dunnhofer/TREK-150-toolkit/issues/5 before fixing it). Where could I find the precomputed results? I basically ran this from scratch and got results that way.

matteo-dunnhofer commented 5 months ago

Here are the results: https://uniudamce-my.sharepoint.com/:u:/g/personal/matteo_dunnhofer_uniud_it/EbnWz8FPqetPgXErg1SNNhABeBpTrlMMqKFr6xIxreD6UQ?e=xnC4Xo

vineetparikh commented 5 months ago

Hi Matteo, so I pulled the results and specifically focused on evaluating for LTMU-H. Here's the code:

import sys
sys.path.append('./TREK-150-toolkit')

from ltmuh import LTMUH
from toolkit.experiments import ExperimentTREK150

tracker = LTMUH()

root_dir = './TREK-150-toolkit/TREK-150' # set the path to TREK-150's root folder
exp = ExperimentTREK150(root_dir, result_dir='./TREK-150-Dunnhofer-Results', report_dir='./TREK-150-Dunnhofer-Report')
prot = 'ope'

# Run an experiment with the protocol of interest and save results
# exp.run(tracker, protocol=prot, visualize=False)

# Generate a report for the protocol of interest
exp.report([tracker.name], protocol=prot)

I still have results for LTMU-H that are lower than the results in the report. Here's the success plot, NP plot, and GSR plot

vineetparikh commented 5 months ago

For some reason I can't attach the YAML file for my conda env, so I'll post it as plaintext here but this should be import-able:

name: ltmuh
channels:
  - conda-forge
  - huggingface
  - iopath
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - ca-certificates=2022.4.26=h06a4308_0
  - certifi=2021.5.30=py36h06a4308_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - ncurses=6.3=h7f8727e_2
  - openssl=1.1.1o=h7f8727e_0
  - pip=21.2.2=py36h06a4308_0
  - python=3.6.13=h12debd9_1
  - readline=8.1.2=h7f8727e_1
  - setuptools=58.0.4=py36h06a4308_0
  - sqlite=3.38.3=hc218d9a_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.2.5=h7f8727e_1
  - zlib=1.2.12=h7f8727e_2
  - pip:
    - cffi==1.15.0
    - cycler==0.11.0
    - cython==0.29.30
    - dataclasses==0.8
    - easydict==1.9
    - fire==0.4.0
    - future==0.18.2
    - got10k==0.1.3
    - importlib-resources==5.4.0
    - jinja2==3.0.3
    - joblib==1.1.0
    - jpeg4py==0.1.4
    - kiwisolver==1.3.1
    - lmdb==1.3.0
    - markupsafe==2.0.1
    - matplotlib==3.3.4
    - msgpack==1.0.4
    - numpy==1.19.5
    - opencv-python==4.6.0.66
    - pascal-voc-writer==0.1.4
    - pillow==8.4.0
    - protobuf==3.19.4
    - pycparser==2.21
    - pyparsing==3.0.9
    - python-dateutil==2.8.2
    - pyyaml==5.3.1
    - scikit-learn==0.24.2
    - scipy==1.2.1
    - shapely==1.8.4
    - six==1.16.0
    - sklearn==0.0
    - tensorboardx==2.5.1
    - termcolor==1.1.0
    - threadpoolctl==3.1.0
    - timm==0.3.2
    - torch==1.4.0
    - torchvision==0.5.0
    - tqdm==4.19.9
    - typing-extensions==4.1.1
    - wget==3.2
    - yacs==0.1.8
    - zipp==3.6.0

vineetparikh commented 5 months ago

Any idea as to what's going on?

matteo-dunnhofer commented 5 months ago

I tried again but I still obtain the correct results. The yaml looks good. There might be something wrong with the annotation files. Send me an e-mail to matteo.dunnhofer@uniud.it and I will share a different version.

vineetparikh commented 5 months ago

email sent! I'm additionally still confused on why my reproduced results are different from the ones in the link, but I guess we can take this discussion offline and update this thread with results

vineetparikh commented 5 months ago

i'm also willing to find time and hop on a call to debug!

matteo-dunnhofer commented 5 months ago

I replied to your e-mail. It's a quite busy period time for me, let's try so solve the issue offline first.

relh commented 4 months ago

generalized_success_robustness_plots normalized_precision_plots success_plots

I just re-did everything from scratch from the repo and got these results~

matteo-dunnhofer commented 4 months ago

This is the expected behaviour. Thanks for pointing out @relh!

vineetparikh commented 4 months ago

Thanks @relh for reproducing and confirming it's a setup issue on my end! Will follow up with you on fixing inconsistencies with my setup.

(I'll leave this issue open until I figure this out and post the fix below, but will work on this offline: thanks to Matteo for all the help as well!)

matteo-dunnhofer / fpv-tracking-baselines

LTMU-H reproduction not performing as well as reported? #3