Open vineetparikh opened 5 months ago
for context, i'm using the same checkpoints for HOI and STARK as listed in the readme, so I don't know if there's any additional training that needs to be done/if there's another checkpoint that gives the results in the paper
Hi @vineetparikh,
that's strange. I tested the repo multiple times and always had the correct results. No additional training or checkpoint rather than those posted in the README are needed. Maybe something is wrong with frames and annotations? Did you try to run the report method on the precomputed results we provide?
Yup, I pre-extracted the frames with the same ffmpeg version and visualized them to make sure the annotations looked good (actually had opened another issue at https://github.com/matteo-dunnhofer/TREK-150-toolkit/issues/5 before fixing it). Where could I find the precomputed results? I basically ran this from scratch and got results that way.
Hi Matteo, so I pulled the results and specifically focused on evaluating for LTMU-H. Here's the code:
import sys
sys.path.append('./TREK-150-toolkit')
from ltmuh import LTMUH
from toolkit.experiments import ExperimentTREK150
tracker = LTMUH()
root_dir = './TREK-150-toolkit/TREK-150' # set the path to TREK-150's root folder
exp = ExperimentTREK150(root_dir, result_dir='./TREK-150-Dunnhofer-Results', report_dir='./TREK-150-Dunnhofer-Report')
prot = 'ope'
# Run an experiment with the protocol of interest and save results
# exp.run(tracker, protocol=prot, visualize=False)
# Generate a report for the protocol of interest
exp.report([tracker.name], protocol=prot)
I still have results for LTMU-H that are lower than the results in the report. Here's the success plot, NP plot, and GSR plot
For some reason I can't attach the YAML file for my conda env, so I'll post it as plaintext here but this should be import-able:
name: ltmuh
channels:
- conda-forge
- huggingface
- iopath
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- ca-certificates=2022.4.26=h06a4308_0
- certifi=2021.5.30=py36h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.3=he6710b0_2
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- ncurses=6.3=h7f8727e_2
- openssl=1.1.1o=h7f8727e_0
- pip=21.2.2=py36h06a4308_0
- python=3.6.13=h12debd9_1
- readline=8.1.2=h7f8727e_1
- setuptools=58.0.4=py36h06a4308_0
- sqlite=3.38.3=hc218d9a_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.37.1=pyhd3eb1b0_0
- xz=5.2.5=h7f8727e_1
- zlib=1.2.12=h7f8727e_2
- pip:
- cffi==1.15.0
- cycler==0.11.0
- cython==0.29.30
- dataclasses==0.8
- easydict==1.9
- fire==0.4.0
- future==0.18.2
- got10k==0.1.3
- importlib-resources==5.4.0
- jinja2==3.0.3
- joblib==1.1.0
- jpeg4py==0.1.4
- kiwisolver==1.3.1
- lmdb==1.3.0
- markupsafe==2.0.1
- matplotlib==3.3.4
- msgpack==1.0.4
- numpy==1.19.5
- opencv-python==4.6.0.66
- pascal-voc-writer==0.1.4
- pillow==8.4.0
- protobuf==3.19.4
- pycparser==2.21
- pyparsing==3.0.9
- python-dateutil==2.8.2
- pyyaml==5.3.1
- scikit-learn==0.24.2
- scipy==1.2.1
- shapely==1.8.4
- six==1.16.0
- sklearn==0.0
- tensorboardx==2.5.1
- termcolor==1.1.0
- threadpoolctl==3.1.0
- timm==0.3.2
- torch==1.4.0
- torchvision==0.5.0
- tqdm==4.19.9
- typing-extensions==4.1.1
- wget==3.2
- yacs==0.1.8
- zipp==3.6.0
Any idea as to what's going on?
I tried again but I still obtain the correct results. The yaml looks good. There might be something wrong with the annotation files. Send me an e-mail to matteo.dunnhofer@uniud.it and I will share a different version.
email sent! I'm additionally still confused on why my reproduced results are different from the ones in the link, but I guess we can take this discussion offline and update this thread with results
i'm also willing to find time and hop on a call to debug!
I replied to your e-mail. It's a quite busy period time for me, let's try so solve the issue offline first.
I just re-did everything from scratch from the repo and got these results~
This is the expected behaviour. Thanks for pointing out @relh!
Thanks @relh for reproducing and confirming it's a setup issue on my end! Will follow up with you on fixing inconsistencies with my setup.
(I'll leave this issue open until I figure this out and post the fix below, but will work on this offline: thanks to Matteo for all the help as well!)
Hi there, thanks so much for the great work and toolkit for future benchmarks!
I'm running the LTMU-H baseline for TREK-150 under the OPE protocol to get some initial understanding of quantitative performance, and I'm finding that SS, NPS, and GSS are significantly lower than what's reported in the paper. I've posted my values below.
I followed the initial guidelines, so my initial thought is that there's something different between my setup and the setup used to run evaluation. Any idea as to what's going on?