marrlab / DomainLab

modular domain generalization: https://pypi.org/project/domainlab/
https://marrlab.github.io/DomainLab/
MIT License
42 stars 2 forks source link

Run benchmark for fbopt, dial diva. #840

Open MatteoWohlrapp opened 4 months ago

MatteoWohlrapp commented 4 months ago

Run the benchmark

To run the benchmark, log into the cluster first. The benchmark is located on the mhof_dev_merge branch. Given that domainlab with all necessary dependencies is installed in a conda enviroment, you can run the lines to execute the benchmark in the root directory of the Domainlab project:

git pull 
git checkout mhof_dev_merge 
./sh_download_pacs.sh # (belated update) download the pacs dataset if you havent yet
tmux new-session -s fbopt-dial-diva # create new tmux session
conda activate <environment_name> # activate your environment

# If you never ran a benchmark before, use the two lines below to install the necessary dependencies
pip install snakemake==7.32.0
pip install pulp==2.7.0

# Start the benchmark 
./run_benchmark_slurm.sh examples/benchmark/pacs_fbopt_dial_diva.yaml

In case of an issue with backpack, you can upgrade dependencies to nvidia-nccl-cu12-2.20.5 torch-2.3.0 torchvision-0.18.0 triton-2.3.0

Obtain the results

After starting the benchmark, it will take some time to finish. The benchmark creates a different slurm job for every hyperparameter combination. To see if some are still running, use squeue -u <username> or log into your previous tmux session tmux attach-session -t fbopt-dial-diva. The results will be written to ./zoutput/benchmarks/benchmark_fbopt_dial_diva_pacs_<datetime> in a results.csv file. If the benchmarks did not complete, the file might not be available. In that case run python main_out.py --agg_partial_bm <output directory>.

In the end, please retrieve the results.csv file and the plots with python main_out.py --gen_plots <csv file> --outp_dir <output directory>.

DanScarc commented 4 months ago

Here is the logfile:

logs.txt

smilesun commented 2 months ago

use mhof_dev branch

./run_benchmark_slurm.sh examples/benchmark/pacs_fbopt_dial_diva.yaml

smilesun commented 2 months ago
zoutput/benchmarks/benchmark_fbopt_dial_diva_pacs_2024-07-13_01-56-14/slurm_logs/run_experiment/run_experiment-index=134-21967700.err-230-[Errno 2] No such file or directory: 'data/pacs/PACS/art_painting/dog/pic_225.jpg'
zoutput/benchmarks/benchmark_fbopt_dial_diva_pacs_2024-07-13_01-56-14/slurm_logs/run_experiment/run_experiment
smilesun commented 2 months ago

previous yaml file was wrongly specified, now rerunning