write a markdown file to introduce several steps to specify domainlab arguments

smilesun commented 3 months ago

What are the essential thing to specify to use domainlab?

Model
Trainer
Task (how to specify task)

Currently, we only have

usage: main_out.py [-h] [-c CONFIG_FILE] [--lr LR] [--gamma_reg GAMMA_REG]
                   [--es ES] [--seed SEED] [--nocu] [--device DEVICE] [--gen]
                   [--keep_model] [--epos EPOS] [--epos_min EPOS_MIN]
                   [--epo_te EPO_TE] [-w WARMUP] [--debug] [--dmem]
                   [--no_dump] [--trainer TRAINER] [--out OUT] [--dpath DPATH]
                   [--tpath TPATH] [--npath NPATH] [--npath_dom NPATH_DOM]
                   [--npath_argna2val NPATH_ARGNA2VAL]
                   [--nname_argna2val NNAME_ARGNA2VAL] [--nname NNAME]
                   [--nname_dom NNAME_DOM] [--apath APATH] [--exptag EXPTAG]
                   [--aggtag AGGTAG] [--agg_partial_bm BM_DIR]
                   [--gen_plots PLOT_DATA] [--outp_dir OUTP_DIR]
                   [--param_idx PARAM_IDX] [--msel {val,loss_tr}] [--model an]
                   [--acon ac] [--task ta] [--bs BS] [--split SPLIT]
                   [--te_d [TE_D ...]] [--tr_d [TR_D ...]] [--san_check]
                   [--san_num SAN_NUM] [--loglevel LOGLEVEL] [--shuffling_off]
                   [--zd_dim ZD_DIM] [--zx_dim ZX_DIM] [--zy_dim ZY_DIM]
                   [--topic_dim TOPIC_DIM]
                   [--nname_encoder_x2topic_h NNAME_ENCODER_X2TOPIC_H]
                   [--npath_encoder_x2topic_h NPATH_ENCODER_X2TOPIC_H]
                   [--nname_encoder_sandwich_x2h4zd NNAME_ENCODER_SANDWICH_X2H4ZD]
                   [--npath_encoder_sandwich_x2h4zd NPATH_ENCODER_SANDWICH_X2H4ZD]
                   [--gamma_y GAMMA_Y] [--gamma_d GAMMA_D] [--beta_t BETA_T]
                   [--beta_d BETA_D] [--beta_x BETA_X] [--beta_y BETA_Y]
                   [--tau TAU] [--epos_per_match_update EPOS_PER_MATCH_UPDATE]
                   [--epochs_ctr EPOCHS_CTR] [--nperm NPERM] [--pperm PPERM]
                   [--jigen_ppath JIGEN_PPATH] [--grid_len GRID_LEN]
                   [--dial_steps_perturb DIAL_STEPS_PERTURB]
                   [--dial_noise_scale DIAL_NOISE_SCALE] [--dial_lr DIAL_LR]
                   [--dial_epsilon DIAL_EPSILON]

DomainLab

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config CONFIG_FILE
                        load YAML configuration
  --lr LR               learning rate
  --gamma_reg GAMMA_REG
                        weight of regularization loss
  --es ES               early stop steps
  --seed SEED           random seed (default: 0)
  --nocu                disables CUDA
  --device DEVICE       device name default None
  --gen                 save generated images
  --keep_model          do not delete model at the end of training
  --epos EPOS           maximum number of epochs
  --epos_min EPOS_MIN   maximum number of epochs
  --epo_te EPO_TE       test performance per {} epochs
  -w WARMUP, --warmup WARMUP
                        number of epochs for hyper-parameter warm-up. Set to 0
                        to turn warmup off.
  --debug
  --dmem
  --no_dump             suppress saving the confusion matrix
  --trainer TRAINER     specify which trainer to use
  --out OUT             absolute directory to store outputs
  --dpath DPATH         path for storing downloaded dataset
  --tpath TPATH         path for custom task, should implement get_task
                        function
  --npath NPATH         path of custom neural network for feature extraction
  --npath_dom NPATH_DOM
                        path of custom neural network for feature extraction
  --npath_argna2val NPATH_ARGNA2VAL
                        specify new arguments and their value e.g. '--
                        npath_argna2val my_custom_arg_na --npath_argna2val
                        xx/yy/zz.py', additional pairs can be appended
  --nname_argna2val NNAME_ARGNA2VAL
                        specify new arguments and their values e.g. '--
                        nname_argna2val my_custom_network_arg_na
                        --nname_argna2val alexnet', additional pairs can be
                        appended
  --nname NNAME         name of custom neural network for feature extraction
                        of classification
  --nname_dom NNAME_DOM
                        name of custom neural network for feature extraction
                        of domain
  --apath APATH         path for custom AlgorithmBuilder
  --exptag EXPTAG       tag as prefix of result aggregation file name e.g. git
                        hash for reproducibility
  --aggtag AGGTAG       tag in each line of result aggregation file e.g., to
                        specify potential different configurations
  --agg_partial_bm BM_DIR
                        Aggregates and plots partial data of a snakemake
                        benchmark. Requires the benchmark config file. Other
                        arguments will be ignored.
  --gen_plots PLOT_DATA
                        plots the data of a snakemake benchmark. Requires the
                        results.csv fileand an output file (specify by
                        --outp_file,default is
                        zoutput/benchmarks/shell_benchmark). Other arguments
                        will be ignored.
  --outp_dir OUTP_DIR   outpus file for the plots when creating themusing
                        --gen_plots. Default is
                        zoutput/benchmarks/shell_benchmark
  --param_idx PARAM_IDX
                        True: parameter index is used in the pots generated
                        with --gen_plots.False: parameter name is used.Default
                        is True.
  --msel {val,loss_tr}  model selection for early stop: val, loss_tr, recon,
                        the elbo and recon only make sense for vae models,
                        will be ignored by other methods
  --model an            algorithm name
  --acon ac             algorithm configuration name, (default None)
  --task ta             task name

task args:
  --bs BS               loader batch size for mixed domains
  --split SPLIT         proportion of training, a value between 0 and 1, 0
                        means no train-validation split
  --te_d [TE_D ...]     test domain names separated by single space, will be
                        parsed to be list of strings
  --tr_d [TR_D ...]     training domain names separated by single space, will
                        be parsed to be list of strings; if not provided then
                        all available domains that are not assigned to the
                        test set will be used as training domains
  --san_check           save images from the dataset as a sanity check
  --san_num SAN_NUM     number of images to be dumped for the sanity check
  --loglevel LOGLEVEL   sets the loglevel of the logger
  --shuffling_off       disable shuffling of the training dataloader for the
                        dataset

vae:
  --zd_dim ZD_DIM       diva: size of latent space for domain
  --zx_dim ZX_DIM       diva: size of latent space for unobserved
  --zy_dim ZY_DIM       diva, hduva: size of latent space for class
  --topic_dim TOPIC_DIM
                        hduva: number of topics
  --nname_encoder_x2topic_h NNAME_ENCODER_X2TOPIC_H
                        hduva: network from image to topic distribution
  --npath_encoder_x2topic_h NPATH_ENCODER_X2TOPIC_H
                        hduva: network from image to topic distribution
  --nname_encoder_sandwich_x2h4zd NNAME_ENCODER_SANDWICH_X2H4ZD
                        hduva: network from image and topic to zd
  --npath_encoder_sandwich_x2h4zd NPATH_ENCODER_SANDWICH_X2H4ZD
                        hduva: network from image and topic to zd
  --gamma_y GAMMA_Y     diva, hduva: multiplier for y classifier
  --gamma_d GAMMA_D     diva: multiplier for d classifier from zd
  --beta_t BETA_T       hduva: multiplier for KL topic
  --beta_d BETA_D       diva: multiplier for KL d
  --beta_x BETA_X       diva: multiplier for KL x
  --beta_y BETA_Y       diva, hduva: multiplier for KL y

matchdg:
  --tau TAU             factor to magnify cosine similarity
  --epos_per_match_update EPOS_PER_MATCH_UPDATE
                        Number of epochs before updating the match tensor
  --epochs_ctr EPOCHS_CTR
                        Total number of epochs for ctr

jigen:
  --nperm NPERM         number of permutations
  --pperm PPERM         probability of permutating the tiles of an image
  --jigen_ppath JIGEN_PPATH
                        npy file path to load numpy array with each row being
                        permutation index, if not None, nperm and grid_len has
                        to agree with the number of row and columns of the
                        input array
  --grid_len GRID_LEN   length of image in tile unit

dial:
  --dial_steps_perturb DIAL_STEPS_PERTURB
                        how many gradient step to go to find an image as
                        adversarials
  --dial_noise_scale DIAL_NOISE_SCALE
                        variance of gaussian noise to inject on pure image
  --dial_lr DIAL_LR     learning rate to generate adversarial images
  --dial_epsilon DIAL_EPSILON
                        pixel wise threshold to perturb images

MatteoWohlrapp commented 3 months ago

The help command lists all those arguments as optional, but which one is actually optional? It sometimes crashes when testing out what I must provide to the program because we, e.g., pass None somewhere. We should specify that when selecting parameters, not when running. One example is we need nname, otherwise, the VAEChain is None, the gammas because, otherwise, we pass a NoneType, the test domain, but not the training domain.

smilesun commented 3 months ago

Necessary arguments:

Neural Network: either nname or npath, we have to give npath a synonym, s.t. nn? nname won't be used by user most of the time so we can ignore explaning this
Model
Hyperparameters for selected model if any
Trainer
Hyperparameters for selected trainer if any
Testdomain: te_d

marrlab / DomainLab

write a markdown file to introduce several steps to specify domainlab arguments #800