Open smilesun opened 1 year ago
To reuse the tensorboard code we only need to change this line 85 which is self.set_scheduler(scheduler=HyperSchedulerFeedback) From file domainlab/algos/trainers_train_fbopt_b.py
And replace the HyperSchedulerFeedback to HyperSchedulerWarmup or HyperSchedulerWarmupExponetial
The above two classes can be imported from domainlab/algos/trainers/hyper_scheduler.py
https://github.com/marrlab/DomainLab/pull/624/files
@agisga , could you compare the phase portrait of the two settings?
sh run_fbopt_mnist_feedforward.sh
@agisga it is not working yet, i will see if i could fix it.
File "/home/playtime/domainlab/main_out.py", line 17, in
h run_fbopt_mnist_feedforward.sh
no algorithm conf specified, going to use default
/home/sunxd/domainlab/domainlab/arg_parser.py:252: UserWarning: no algorithm conf specified, going to use default
warnings.warn("no algorithm conf specified, going to use default")
using device: cuda
/home/sunxd/anaconda3/lib/python3.9/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
/home/sunxd/anaconda3/lib/python3.9/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `AUROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
b'cdf0c565'
model name: mnistcolor10_te_rgb_31_119_180_jigen_bcdf0c565_2023md_11md_07_15_43_03_seed_0
Experiment start at: 2023-11-07 15:43:03.725952
Traceback (most recent call last):
File "/home/sunxd/domainlab/main_out.py", line 17, in <module>
exp.execute()
File "/home/sunxd/domainlab/domainlab/compos/exp/exp_main.py", line 68, in execute
self.trainer.before_tr()
File "/home/sunxd/domainlab/domainlab/algos/trainers/train_fbopt_b.py", line 92, in before_tr
self.set_model_with_mu() # very small value
File "/home/sunxd/domainlab/domainlab/algos/trainers/train_fbopt_b.py", line 121, in set_model_with_mu
self.model.hyper_update(epoch=None, fun_scheduler=self.hyper_scheduler)
File "/home/sunxd/domainlab/domainlab/models/model_dann.py", line 70, in hyper_update
dict_rst = fun_scheduler(epoch) # the __call__ method of hyperparameter scheduler
File "/home/sunxd/domainlab/domainlab/algos/trainers/hyper_scheduler.py", line 39, in __call__
dict_rst[key] = self.warmup(val_setpoint, epoch)
File "/home/sunxd/domainlab/domainlab/algos/trainers/hyper_scheduler.py", line 31, in warmup
ratio = ((epoch+1) * 1.) / self.total_steps
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
Experiment start at: 2023-11-07 15:49:52.197946
Traceback (most recent call last):
File "/home/sunxd/domainlab/main_out.py", line 17, in <module>
exp.execute()
File "/home/sunxd/domainlab/domainlab/compos/exp/exp_main.py", line 68, in execute
self.trainer.before_tr()
File "/home/sunxd/domainlab/domainlab/algos/trainers/train_fbopt_b.py", line 99, in before_tr
self.hyper_scheduler.set_setpoint(
AttributeError: 'HyperSchedulerWarmup' object has no attribute 'set_setpoint'
we want to compare against feedfoward mu controller (exponential warm up) and fixed mu for ell () + mu*R(), see YAML file below:
https://github.com/marrlab/DomainLab/blob/fbopt/examples/benchmark/benchmark_fbopt_mnist_jigen.yaml
it is interesting to know how they behave in tensorboard regarding ell loss and R loss