imkevinkuo / flasc

Federated LoRA with Sparse Communication
6 stars 0 forks source link

Plot results #1

Closed nsaadati closed 1 month ago

nsaadati commented 2 months ago

Hi,

Thank you for your hard work on this project. Could you please guide me on how to save and plot the results in a manner similar to those presented in the paper?

imkevinkuo commented 2 months ago

Thank you for your interest!

Each run of the training script is logged in a Tensorboard events file. We recommend using the tbparse library to load the logs. The following helper function may be of use for loading multiple runs within a folder named path:

from tbparse import SummaryReader
import glob
import json
import os
def load_tf_logs(path, keys=['eval/accuracy', 'test/accuracy']):
    all_runs = {} # dict of dicts (run_dir:args:values)
    empty_runs = [] # list of strings (directory names)
    for run_dir in glob.glob(f"{path}/**", recursive=True):
        if os.path.isdir(run_dir):
            if os.path.exists(f"{run_dir}/args.json"):
                with open(f"{run_dir}/args.json") as f:
                    args = json.load(f)
                    all_runs[run_dir] = args

                events_files = [fn for fn in os.listdir(run_dir) if fn.startswith('events')]
                if len(events_files) == 0:
                    empty_runs.append(run_dir)
                elif len(events_files) == 1:
                    fn = events_files[0]
                    reader = SummaryReader(f"{run_dir}/{fn}")
                    df = reader.tensors
                    if 'tag' in df:
                        for key in keys:
                            sub_df = df[df['tag'] == key]
                            args[key] = dict(zip(sub_df['step'].values, sub_df['value'].values))
                    else:
                        empty_runs.append(run_dir)
            elif len(os.listdir(run_dir)) == 0:
                empty_runs.append(run_dir)
    return all_runs, empty_runs

We generated all plots using the matplotlib library. Here is an example code snippet that plots the eval trace of a run (with # rounds on the x-axis):

runs_dict, empty_runs = load_tf_logs(path)
sample_run = list(runs_dict.values())[0]
# sample_run contains two types of key-value pairs:
# 1. script arguments : values 
# 2. Tensorboard scalar key : {dict of step : value}
import matplotlib.pyplot as plt
X = list(sample_run['eval/accuracy'].keys())
Y = list(sample_run['eval/accuracy'].values())
plt.plot(X,Y)
nsaadati commented 2 months ago

I'm running train_sparse_lora_het.py but I'm getting this error for plotting /home/nsaadati/.conda/envs/flasc/lib/python3.10/site-packages/requests/init.py:86: RequestsDependencyWarning: Unable to find acceptable character detection dependency (chardet or charset_normalizer). warnings.warn( {'runs/het': {'gpu': '0', 'dir': 'runs', 'name': 'het', 'save': 'true', 'dataset': '20newsgroups', 'iid_alpha': 0.1, 'clients': 350, 'model': 'vit_b_16', 'resume': 0, 'seed': 0, 'eval_freq': 20, 'eval_first': 'false', 'eval_frac': 1, 'eval_masked': 'true', 'server_opt': 'adam', 'server_lr': 0.005, 'server_batch': 10, 'server_rounds': 5, 'client_lr': 0.0005, 'client_batch': 16, 'client_epochs': 1, 'client_freeze': 'false', 'server_freeze': 'false', 'syshet': 'rank', 'tiers': 3}, 'runs/dp': {'gpu': '0', 'dir': 'runs', 'name': 'dp', 'save': 'false', 'dataset': 'cifar10', 'iid_alpha': 0.1, 'clients': 500, 'model': 'vit_b_16', 'resume': 0, 'seed': 0, 'eval_freq': 10, 'eval_first': 'false', 'eval_frac': 1, 'eval_masked': 'true', 'server_opt': 'adam', 'server_lr': 0.001, 'server_batch': 10, 'server_rounds': 5, 'client_lr': 0.001, 'client_batch': 16, 'client_epochs': 1, 'client_freeze': 'false', 'server_freeze': 'false', 'freeze_a': 'false', 'dl_density': 1.0, 'dl_density_decay': 1.0, 'ul_density': 1.0, 'ul_density_decay': 1.0, 'decay_freq': 1, 'lora_rank': 16, 'lora_alpha': 16, 'l2_clip_norm': 0, 'noise_multiplier': 0}, 'runs/test': {'gpu': '0', 'dir': 'runs', 'name': 'test', 'save': 'false', 'dataset': '20newsgroups', 'iid_alpha': 0.1, 'clients': 350, 'model': 'vit_b_16', 'resume': 0, 'seed': 0, 'eval_freq': 20, 'eval_first': 'false', 'eval_frac': 1, 'eval_masked': 'true', 'server_opt': 'adam', 'server_lr': 0.005, 'server_batch': 10, 'server_rounds': 200, 'client_lr': 0.0005, 'client_batch': 16, 'client_epochs': 1, 'client_freeze': 'false', 'server_freeze': 'false', 'syshet': 'rank', 'tiers': 3, 'eval/accuracy': {}, 'test/accuracy': {}}} Traceback (most recent call last): File "/work/mech-ai-scratch/nsaadati/projects/dlora/flasc/plot.py", line 39, in X = list(sample_run['eval/accuracy'].keys()) KeyError: 'eval/accuracy'

imkevinkuo commented 2 months ago

In train_sparse_lora_het.py, we only log the testing accuracy as "test_X/accuracy" where X is one of [1, 0.25, 0.0625, ..., 4**(-args.tiers-1)]. During evaluation, the global model is temporarily pruned to density X. During experiments we found that it was beneficial to slightly prune the model. To properly load the keys, you will have to pass the corresponding list of keys to load_tf_logs(path, keys=["test_1/accuracy", "test_0.25/accuracy", ...]).

You can modify lines 201-202 to also log the validation accuracy:

if valloader is not None:
    log_stats(writer, f"eval_{cfg['dl_density']}", eval_loop(eval_model, valloader), rnd+1)

(sorry for the repeated edits!)