Open MatteoWohlrapp opened 4 months ago
use mhof_dev branch
./run_benchmark_slurm.sh examples/benchmark/pacs_fbopt_dial_diva.yaml
zoutput/benchmarks/benchmark_fbopt_dial_diva_pacs_2024-07-13_01-56-14/slurm_logs/run_experiment/run_experiment-index=134-21967700.err-230-[Errno 2] No such file or directory: 'data/pacs/PACS/art_painting/dog/pic_225.jpg'
zoutput/benchmarks/benchmark_fbopt_dial_diva_pacs_2024-07-13_01-56-14/slurm_logs/run_experiment/run_experiment
previous yaml file was wrongly specified, now rerunning
Run the benchmark
To run the benchmark, log into the cluster first. The benchmark is located on the mhof_dev_merge branch. Given that domainlab with all necessary dependencies is installed in a conda enviroment, you can run the lines to execute the benchmark in the root directory of the Domainlab project:
In case of an issue with backpack, you can upgrade dependencies to
nvidia-nccl-cu12-2.20.5 torch-2.3.0 torchvision-0.18.0 triton-2.3.0
Obtain the results
After starting the benchmark, it will take some time to finish. The benchmark creates a different slurm job for every hyperparameter combination. To see if some are still running, use
squeue -u <username>
or log into your previous tmux sessiontmux attach-session -t fbopt-dial-diva
. The results will be written to./zoutput/benchmarks/benchmark_fbopt_dial_diva_pacs_<datetime>
in aresults.csv
file. If the benchmarks did not complete, the file might not be available. In that case runpython main_out.py --agg_partial_bm <output directory>
.In the end, please retrieve the
results.csv
file and the plots withpython main_out.py --gen_plots <csv file> --outp_dir <output directory>
.